Загрузка...

Sharpness Dimension: Why Chaotic Training Works

In this AI Research Roundup episode, Alex discusses the paper: 'Generalization at the Edge of Stability' Neural networks trained with large learning rates often operate at the edge of stability, where optimization becomes chaotic. This paper models these optimizers as random dynamical systems that converge to fractal attractors rather than single points. The authors introduce the sharpness dimension, a new metric inspired by Lyapunov theory, to bound generalization in these chaotic regimes. Unlike previous theories, this approach considers the entire Hessian spectrum to explain why chaotic dynamics improve model performance. The research provides new theoretical insights into complex phenomena like grokking across transformers and multi-layer perceptrons. Paper URL: https://arxiv.org/pdf/2604.19740 #AI #MachineLearning #DeepLearning #Optimization #Grokking #HessianSpectrum #EdgeOfStability

Видео Sharpness Dimension: Why Chaotic Training Works канала AI Research Roundup
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять