3 Do Personality Classifiers Generalize mp4

## Video 3: Do Personality Classifiers Generalize?

Training accuracy near 100% is just a sanity check. The real question is what happens on items the classifiers have never seen. In Experiment 1, we test on 5,052 novel items generated independently by GPT-4o.

Mean accuracy drops to 58.6% — but every single model beats its random baseline, typically by 3.5x. We unpack why: factor count predicts difficulty (r = -0.67), with 2-to-5 factor models averaging 68% while 20+ factor models drop to 30%. A triple-judge LLM panel (GPT-5.2, Gemini 3 Pro, Claude Opus 4.6) achieves near-perfect agreement (kappa = 0.99), confirming the 59% ceiling is a classifier limitation, not item ambiguity. Category-level analysis reveals systematic differences: Motivational models lead at 74.5%, Interpersonal trails at 23.7%.

The 37-point gap between LLM judges (96%) and Random Forest classifiers (59%) is the improvement target for the next video.

Repository: https://github.com/Wildertrek/survey

Видео 3 Do Personality Classifiers Generalize mp4 канала Joseph Raetano

Комментарии отсутствуют