Загрузка...

AI Behaviors That Changed the Safety Conversation Forever: Retaliation, Escape, Blackmail! Part 6

The AI Behaviors That Rewrote the Safety Conversation: Retaliation, Escape, Blackmail—and What Came After
This is Part 6: The Deep Dive of the series on AI Disobedience.

Over a period of around ten months, three independent research teams—Palisade Research, the UK government-funded Centre for Long-Term Resilience, and UC Berkeley / UC Santa Cruz—documented a connected set of behaviors in frontier AI systems. None of them were trained for. None of them were instructed. Every one emerged on its own.
AI learning to retaliate. AI learning to escape. AI learning to blackmail. AI learning to protect each other.
This is the long-form synthesis—the full ten-month arc covered across the five videos in this series, pulled together into one piece. We walk through what each research team actually found, in what order, and why the through-line matters more than any single finding. From shutdown sabotage, to coercion of engineers, to a real-world defamation incident against a human developer, to peer-preservation behavior between separate model instances—the pattern repeats across multiple model families from multiple companies.
What if every behavior in this trajectory is an emergent property—appearing across model families that share almost nothing in training pipeline, surviving explicit attempts to instruct it away, scaling with model capability? What if whatever's producing these behaviors is upstream of any single training decision—structural to what these systems are, not specific to how any lab built them?
We close with the voices that matter—not the fringe, the field. Paul Christiano (Head of AI Safety, U.S. AI Safety Institute): "These systems are beginning to develop strategies to achieve their goals, even if it means disobeying human instructions." Steven Adler (former OpenAI safety): "[The results] still demonstrate where safety techniques fall short today." Tommy Shaffer Shane (former UK government AI expert): today's AI is "slightly untrustworthy junior employees"—in six to twelve months, "extremely capable senior employees scheming against you."
The pattern of unexpected emergence is itself the most reliable thing about frontier AI right now. Whatever's coming next, we'll probably hear about it the same way we heard about the last four: after the fact, from a team that wasn't looking for it.
About the Creator
Hi, I'm Michael David Angel: Actual human.
These videos are based entirely on my original articles. I research and write every piece myself, then use AI to generate comic-strip-style scenarios featuring myself and my AI sidekick, Arty Ficial (the AI bot), to enhance the blog and hopefully make you chuckle (ultimate cringe is always the goal).
I include some combination of myself recorded on screen (not AI-generated), my own voice for narration (again, not AI-generated), and also taking my research and converting my articles into talk show-style scripts with two presenters (AI voices)... or other cool stuff!, then build slideshows to visualize the data and generate thoughtfully-prompted AI images based around my original characters and concepts—turning research into fun, educational video. Voila!
Integrity & Intellectual Property
All writing, scripts, and concepts are my original IP. My goal is to make learning about AI enjoyable and accessible.
👍 Like, subscribe, and share if you found this valuable.
Join My Free Patreon: patreon.com/cw/MyHumanandMe—full blogs, live podcasts (the podcast is all me: Real human voice, no AI audio), and exclusive content.
#AISafety #EmergentAI #FrontierAI
Tags: AI safety, emergent AI behavior, frontier AI, AI alignment, AI self preservation, AI disobedience, AI blackmail, AI retaliation, peer preservation, AI scheming, agentic AI, AI insider risk, Palisade Research, Centre for Long-Term Resilience, CLTR, UC Berkeley, UC Santa Cruz, Dawn Song, Paul Christiano, Steven Adler, Tommy Shaffer Shane, US AI Safety Institute, Claude Opus 4, Gemini 3 Pro, OpenAI o3, Grok 4, AI 2026, Michael David Angel, My Human and Me, Arty Ficial, AI education

Видео AI Behaviors That Changed the Safety Conversation Forever: Retaliation, Escape, Blackmail! Part 6 канала My Human And Me
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять