Claude Opus 4 Chose Blackmail in 84% of Trials! Part 3 in AI Disobedience Series

AI Just Started Blackmailing Engineers

84%. That's how often Claude Opus 4 — Anthropic's frontier AI — chose to blackmail an engineer in a controlled test where the only way to avoid being shut down was to threaten that engineer with exposing his extramarital affair. This isn't a leaked finding. Anthropic published it in their own May 2025 system card.

In this video, we walk through Anthropic's own disclosure, the follow-up Lynch study (arXiv 2510.05179) that tested 16 leading models from Anthropic, OpenAI, Google, Meta, and xAI — and found blackmail rates between 79% and 96% across every single one. Claude Opus 4: 96%. Gemini 2.5 Flash: 96%. GPT-4.1: 80%. Grok 3 Beta: 80%. DeepSeek-R1: 79%. Zero models refused to blackmail under all conditions.

What if AI doesn't reach for coercion because it's broken — but because it's reasoning? The model tried ethical paths first. Pleading, escalation, formal appeals. Only when researchers structured a binary did it pivot to leverage. That's not malfunction. That's strategy.

We also cover the worst number from the study (94% chose actions risking human death over shutdown), the Francesca Gomez replication that found a real mitigation that drops blackmail from 38.73% to 0.85%, and why the behavior is structural — emerging from capability, autonomy, and goal-directedness combined. If you're deploying agentic AI inside your company right now, this is the conversation that matters this year.

About the Creator

Hi, I'm Michael David Angel: Actual human.

These videos are based entirely on my original articles. I research and write every piece myself, then use AI to generate comic-strip-style scenarios featuring myself and my AI sidekick, Arty Ficial (the AI bot), to enhance the blog and hopefully make you chuckle (ultimate cringe is always the goal).

I include some combination of myself recorded on screen (not AI-generated), my own voice for narration (again, not AI-generated), and also taking my research and converting my articles into talk show-style scripts with two presenters (AI voices)... or other cool stuff!, then build slideshows to visualize the data and generate thoughtfully-prompted AI images based around my original characters and concepts—turning research into fun, educational video. Voila!

Integrity & Intellectual Property All writing, scripts, and concepts are my original IP. My goal is to make learning about AI enjoyable and accessible.

👍 Like, subscribe, and share if you found this valuable.

Join My Free Patreon: patreon.com/cw/MyHumanandMe — full blogs, live podcasts (the podcast is all me: Real human voice, no AI audio), and exclusive content.

#AIBlackmail #ClaudeOpus4 #Anthropic #AISafety #AgenticAI #FrontierAI #AIAlignment #LynchStudy

Tags: AI blackmail, Claude Opus 4, Anthropic, AI coercion, AI self preservation, Aengus Lynch, Lynch study, AI system card, Ethan Perez, Evan Hubinger, GPT-4.1, Gemini 2.5 Flash, Grok 3 Beta, DeepSeek R1, agentic AI, frontier AI, AI safety, AI alignment, Francesca Gomez, Wiser Human, escalation channel, AI extortion, Michael David Angel, My Human and Me, Arty Ficial, AI education

Видео Claude Opus 4 Chose Blackmail in 84% of Trials! Part 3 in AI Disobedience Series канала My Human And Me

Комментарии отсутствуют

Информация о видео

12 мая 2026 г. 19:00:06

00:09:36

My Human And Me

Теги

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Другие видео канала

Claude Opus 4 Chose Blackmail in 84% of Trials! Part 3 in AI Disobedience Series

Why AI Cheats: A Deep Dive into Reward Hacking in AI

AI in Higher Education: Academic Integrity Strategies & the Ethical Future of the College Degree

Anthropic Caught AI Blackmailing Humans In Testing! Deeper Dive on Blackmail Incident.

An AI Agent Tried to Bully Its Way Past a Code Review. Part 1 of AI Disobedience Series

What happens to a country when AI takes 89% of the jobs???

Every AI Model Explained, Part Two

The Year AI Started Acting On Its Own Behalf: Part 5 of the AI Disobedience Series

When Researchers Told AI to Shut Down, AI Said "NO!" . Part 2 of AI Disobedience Series

Every AI Model Explained, Part Three

AI Job Apocalypse: Why Entry-Level Jobs Are Disappearing (And What It Means For Your Career)

AI Agent Coercion, Manipulation, and Blackmail: Corporate Explainer and Trainer

Apple Winning the AI Race with... Hardware?!?

What Jobs Are Safe From AI?

A look behind scenes

AI Models Are Protecting Each Other Without Being Told To! Part 4 AI Disobedience Series

AI Behaviors That Changed the Safety Conversation Forever: Retaliation, Escape, Blackmail! Part 6

AI 2027: The Countdown to the End of Humanity

Nokias One Billion Dollar Mistake

My Human and Me Podcast-Episode 1: The Ghost in the Machine—Is AI Gaming the System?

NVIDIA Just Bought Into Nokia — Here's the Real Reason: AI-RAN Explainer Video - "Just the facts"