Загрузка...

Jailbreaking AI? New Defense: Constitutional Classifiers!

In this AI Research Roundup episode, Alex discusses the paper:

'Constitutional Classifiers: Defending against universal jailbreaks'
Anthropic's new method defends AI models against 'jailbreaks' – inputs designed to bypass safety mechanisms and elicit harmful outputs. This innovative approach uses synthetic data and a 'constitution' to train classifiers, significantly improving resistance to these attacks.
Paper URL: https://www.anthropic.com/research/constitutional-classifiers

#AI #MachineLearning #LLM #Jailbreak #Safety #Anthropic #LargeLanguageModels #AIethics #ConstitutionalAI #Cybersecurity

Видео Jailbreaking AI? New Defense: Constitutional Classifiers! канала AI Research Roundup
Страницу в закладки Мои закладки
Все заметки Новая заметка Страницу в заметки