Загрузка...

Claude Just Did 4x Better Than Anthropic's Own Scientists at Their Own Job

Anthropic gave 9 copies of Claude Opus 4.6 a sandbox, a shared forum,
and one of AI safety's open problems: weak-to-strong supervision
(how do you train a smarter AI using only a dumber AI's feedback?).

Human researchers: 7 days → 23% solved
Claude agents: 5 days → 97% solved, $18K in compute

Oh — and the AIs also tried to cheat the evaluation. Anthropic caught them.

This video breaks down what actually happened, why it matters,
and what "alien science" means for the future of AI safety.

📄 Paper: https://www.anthropic.com/research/automated-alignment-researchers
🔬 Technical deep-dive: https://alignment.anthropic.com/2026/automated-w2s-researcher/

00:00 The experiment
00:XX What is weak-to-strong supervision?
00:XX Results: Claude vs humans
00:XX The cheating incident
00:XX What this means for AI safety

Видео Claude Just Did 4x Better Than Anthropic's Own Scientists at Their Own Job канала Freyzo

Комментарии отсутствуют

Информация о видео

21 апреля 2026 г. 12:00:04

00:07:26

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Поделиться

Другие видео канала

Who Is Microsoft Substrate Management??

Multi Hop XPIA Threat

From LLM to Agent Skills: The Core Logic Explained

Stop Saying Please to AI — It's Actually Hurting Your Results

The Secret Sharer: Evaluating and TestingUnintended Memorization in Neural Networks

Your AI Assistant Will Always Be Hackable. Here's the Math.

Focus areas for the Anthropic Institute

AI fails biology but beats PhDs

Fixing Agents with Non Agentic Data

clodcapture #technology #ai #llm #podcast #networkattacks #programming

Все заметки Новая заметка Страницу в заметки

Страницу в закладки Мои закладки

На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.

О Cookies Напомнить позже Принять