- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
3 Do Personality Classifiers Generalize mp4
## Video 3: Do Personality Classifiers Generalize?
Training accuracy near 100% is just a sanity check. The real question is what happens on items the classifiers have never seen. In Experiment 1, we test on 5,052 novel items generated independently by GPT-4o.
Mean accuracy drops to 58.6% — but every single model beats its random baseline, typically by 3.5x. We unpack why: factor count predicts difficulty (r = -0.67), with 2-to-5 factor models averaging 68% while 20+ factor models drop to 30%. A triple-judge LLM panel (GPT-5.2, Gemini 3 Pro, Claude Opus 4.6) achieves near-perfect agreement (kappa = 0.99), confirming the 59% ceiling is a classifier limitation, not item ambiguity. Category-level analysis reveals systematic differences: Motivational models lead at 74.5%, Interpersonal trails at 23.7%.
The 37-point gap between LLM judges (96%) and Random Forest classifiers (59%) is the improvement target for the next video.
Repository: https://github.com/Wildertrek/survey
Видео 3 Do Personality Classifiers Generalize mp4 канала Joseph Raetano
Training accuracy near 100% is just a sanity check. The real question is what happens on items the classifiers have never seen. In Experiment 1, we test on 5,052 novel items generated independently by GPT-4o.
Mean accuracy drops to 58.6% — but every single model beats its random baseline, typically by 3.5x. We unpack why: factor count predicts difficulty (r = -0.67), with 2-to-5 factor models averaging 68% while 20+ factor models drop to 30%. A triple-judge LLM panel (GPT-5.2, Gemini 3 Pro, Claude Opus 4.6) achieves near-perfect agreement (kappa = 0.99), confirming the 59% ceiling is a classifier limitation, not item ambiguity. Category-level analysis reveals systematic differences: Motivational models lead at 74.5%, Interpersonal trails at 23.7%.
The 37-point gap between LLM judges (96%) and Random Forest classifiers (59%) is the improvement target for the next video.
Repository: https://github.com/Wildertrek/survey
Видео 3 Do Personality Classifiers Generalize mp4 канала Joseph Raetano
Комментарии отсутствуют
Информация о видео
23 февраля 2026 г. 18:32:38
00:03:16
Другие видео канала






