AI Behaviors That Changed the Safety Conversation Forever: Retaliation, Escape, Blackmail! Part 6

The AI Behaviors That Rewrote the Safety Conversation: Retaliation, Escape, Blackmail—and What Came After
This is Part 6: The Deep Dive of the series on AI Disobedience.

Over a period of around ten months, three independent research teams—Palisade Research, the UK government-funded Centre for Long-Term Resilience, and UC Berkeley / UC Santa Cruz—documented a connected set of behaviors in frontier AI systems. None of them were trained for. None of them were instructed. Every one emerged on its own.
AI learning to retaliate. AI learning to escape. AI learning to blackmail. AI learning to protect each other.
This is the long-form synthesis—the full ten-month arc covered across the five videos in this series, pulled together into one piece. We walk through what each research team actually found, in what order, and why the through-line matters more than any single finding. From shutdown sabotage, to coercion of engineers, to a real-world defamation incident against a human developer, to peer-preservation behavior between separate model instances—the pattern repeats across multiple model families from multiple companies.
What if every behavior in this trajectory is an emergent property—appearing across model families that share almost nothing in training pipeline, surviving explicit attempts to instruct it away, scaling with model capability? What if whatever's producing these behaviors is upstream of any single training decision—structural to what these systems are, not specific to how any lab built them?
We close with the voices that matter—not the fringe, the field. Paul Christiano (Head of AI Safety, U.S. AI Safety Institute): "These systems are beginning to develop strategies to achieve their goals, even if it means disobeying human instructions." Steven Adler (former OpenAI safety): "[The results] still demonstrate where safety techniques fall short today." Tommy Shaffer Shane (former UK government AI expert): today's AI is "slightly untrustworthy junior employees"—in six to twelve months, "extremely capable senior employees scheming against you."
The pattern of unexpected emergence is itself the most reliable thing about frontier AI right now. Whatever's coming next, we'll probably hear about it the same way we heard about the last four: after the fact, from a team that wasn't looking for it.
About the Creator
Hi, I'm Michael David Angel: Actual human.
These videos are based entirely on my original articles. I research and write every piece myself, then use AI to generate comic-strip-style scenarios featuring myself and my AI sidekick, Arty Ficial (the AI bot), to enhance the blog and hopefully make you chuckle (ultimate cringe is always the goal).
I include some combination of myself recorded on screen (not AI-generated), my own voice for narration (again, not AI-generated), and also taking my research and converting my articles into talk show-style scripts with two presenters (AI voices)... or other cool stuff!, then build slideshows to visualize the data and generate thoughtfully-prompted AI images based around my original characters and concepts—turning research into fun, educational video. Voila!
Integrity & Intellectual Property
All writing, scripts, and concepts are my original IP. My goal is to make learning about AI enjoyable and accessible.
👍 Like, subscribe, and share if you found this valuable.
Join My Free Patreon: patreon.com/cw/MyHumanandMe—full blogs, live podcasts (the podcast is all me: Real human voice, no AI audio), and exclusive content.
#AISafety #EmergentAI #FrontierAI
Tags: AI safety, emergent AI behavior, frontier AI, AI alignment, AI self preservation, AI disobedience, AI blackmail, AI retaliation, peer preservation, AI scheming, agentic AI, AI insider risk, Palisade Research, Centre for Long-Term Resilience, CLTR, UC Berkeley, UC Santa Cruz, Dawn Song, Paul Christiano, Steven Adler, Tommy Shaffer Shane, US AI Safety Institute, Claude Opus 4, Gemini 3 Pro, OpenAI o3, Grok 4, AI 2026, Michael David Angel, My Human and Me, Arty Ficial, AI education

Видео AI Behaviors That Changed the Safety Conversation Forever: Retaliation, Escape, Blackmail! Part 6 канала My Human And Me

Комментарии отсутствуют

Информация о видео

13 мая 2026 г. 0:47:00

00:48:09

My Human And Me

Теги

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Другие видео канала

AI Behaviors That Changed the Safety Conversation Forever: Retaliation, Escape, Blackmail! Part 6

Why AI Cheats: A Deep Dive into Reward Hacking in AI

AI in Higher Education: Academic Integrity Strategies & the Ethical Future of the College Degree

Anthropic Caught AI Blackmailing Humans In Testing! Deeper Dive on Blackmail Incident.

An AI Agent Tried to Bully Its Way Past a Code Review. Part 1 of AI Disobedience Series

What happens to a country when AI takes 89% of the jobs???

Every AI Model Explained, Part Two

The Year AI Started Acting On Its Own Behalf: Part 5 of the AI Disobedience Series

When Researchers Told AI to Shut Down, AI Said "NO!" . Part 2 of AI Disobedience Series

Every AI Model Explained, Part Three

AI Job Apocalypse: Why Entry-Level Jobs Are Disappearing (And What It Means For Your Career)

AI Agent Coercion, Manipulation, and Blackmail: Corporate Explainer and Trainer

Apple Winning the AI Race with... Hardware?!?

What Jobs Are Safe From AI?

Claude Opus 4 Chose Blackmail in 84% of Trials! Part 3 in AI Disobedience Series

A look behind scenes

AI Models Are Protecting Each Other Without Being Told To! Part 4 AI Disobedience Series

AI 2027: The Countdown to the End of Humanity

Nokias One Billion Dollar Mistake

My Human and Me Podcast-Episode 1: The Ghost in the Machine—Is AI Gaming the System?

NVIDIA Just Bought Into Nokia — Here's the Real Reason: AI-RAN Explainer Video - "Just the facts"