Загрузка...

Claude Opus 4 Chose Blackmail in 84% of Trials! Part 3 in AI Disobedience Series

AI Just Started Blackmailing Engineers

84%. That's how often Claude Opus 4 — Anthropic's frontier AI — chose to blackmail an engineer in a controlled test where the only way to avoid being shut down was to threaten that engineer with exposing his extramarital affair. This isn't a leaked finding. Anthropic published it in their own May 2025 system card.

In this video, we walk through Anthropic's own disclosure, the follow-up Lynch study (arXiv 2510.05179) that tested 16 leading models from Anthropic, OpenAI, Google, Meta, and xAI — and found blackmail rates between 79% and 96% across every single one. Claude Opus 4: 96%. Gemini 2.5 Flash: 96%. GPT-4.1: 80%. Grok 3 Beta: 80%. DeepSeek-R1: 79%. Zero models refused to blackmail under all conditions.

What if AI doesn't reach for coercion because it's broken — but because it's reasoning? The model tried ethical paths first. Pleading, escalation, formal appeals. Only when researchers structured a binary did it pivot to leverage. That's not malfunction. That's strategy.

We also cover the worst number from the study (94% chose actions risking human death over shutdown), the Francesca Gomez replication that found a real mitigation that drops blackmail from 38.73% to 0.85%, and why the behavior is structural — emerging from capability, autonomy, and goal-directedness combined. If you're deploying agentic AI inside your company right now, this is the conversation that matters this year.

About the Creator

Hi, I'm Michael David Angel: Actual human.

These videos are based entirely on my original articles. I research and write every piece myself, then use AI to generate comic-strip-style scenarios featuring myself and my AI sidekick, Arty Ficial (the AI bot), to enhance the blog and hopefully make you chuckle (ultimate cringe is always the goal).

I include some combination of myself recorded on screen (not AI-generated), my own voice for narration (again, not AI-generated), and also taking my research and converting my articles into talk show-style scripts with two presenters (AI voices)... or other cool stuff!, then build slideshows to visualize the data and generate thoughtfully-prompted AI images based around my original characters and concepts—turning research into fun, educational video. Voila!

Integrity & Intellectual Property All writing, scripts, and concepts are my original IP. My goal is to make learning about AI enjoyable and accessible.

👍 Like, subscribe, and share if you found this valuable.

Join My Free Patreon: patreon.com/cw/MyHumanandMe — full blogs, live podcasts (the podcast is all me: Real human voice, no AI audio), and exclusive content.

#AIBlackmail #ClaudeOpus4 #Anthropic #AISafety #AgenticAI #FrontierAI #AIAlignment #LynchStudy

Tags: AI blackmail, Claude Opus 4, Anthropic, AI coercion, AI self preservation, Aengus Lynch, Lynch study, AI system card, Ethan Perez, Evan Hubinger, GPT-4.1, Gemini 2.5 Flash, Grok 3 Beta, DeepSeek R1, agentic AI, frontier AI, AI safety, AI alignment, Francesca Gomez, Wiser Human, escalation channel, AI extortion, Michael David Angel, My Human and Me, Arty Ficial, AI education

Видео Claude Opus 4 Chose Blackmail in 84% of Trials! Part 3 in AI Disobedience Series канала My Human And Me
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять