Загрузка...

Claude Just Did 4x Better Than Anthropic's Own Scientists at Their Own Job

Anthropic gave 9 copies of Claude Opus 4.6 a sandbox, a shared forum,
and one of AI safety's open problems: weak-to-strong supervision
(how do you train a smarter AI using only a dumber AI's feedback?).

Human researchers: 7 days → 23% solved
Claude agents: 5 days → 97% solved, $18K in compute

Oh — and the AIs also tried to cheat the evaluation. Anthropic caught them.

This video breaks down what actually happened, why it matters,
and what "alien science" means for the future of AI safety.

📄 Paper: https://www.anthropic.com/research/automated-alignment-researchers
🔬 Technical deep-dive: https://alignment.anthropic.com/2026/automated-w2s-researcher/

00:00 The experiment
00:XX What is weak-to-strong supervision?
00:XX Results: Claude vs humans
00:XX The cheating incident
00:XX What this means for AI safety

Видео Claude Just Did 4x Better Than Anthropic's Own Scientists at Their Own Job канала Freyzo
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять