Загрузка...

Sharpness Dimension: Why Chaotic Training Works

In this AI Research Roundup episode, Alex discusses the paper: 'Generalization at the Edge of Stability' Neural networks trained with large learning rates often operate at the edge of stability, where optimization becomes chaotic. This paper models these optimizers as random dynamical systems that converge to fractal attractors rather than single points. The authors introduce the sharpness dimension, a new metric inspired by Lyapunov theory, to bound generalization in these chaotic regimes. Unlike previous theories, this approach considers the entire Hessian spectrum to explain why chaotic dynamics improve model performance. The research provides new theoretical insights into complex phenomena like grokking across transformers and multi-layer perceptrons. Paper URL: https://arxiv.org/pdf/2604.19740 #AI #MachineLearning #DeepLearning #Optimization #Grokking #HessianSpectrum #EdgeOfStability

Видео Sharpness Dimension: Why Chaotic Training Works канала AI Research Roundup

DeepLearning DynamicalSystems EdgeOfStability Generalization Grokking HessianSpectrum MachineLearning Mathematics NeuralNetworks Optimization Podcast Research SharpnessDimension Transformers

Комментарии отсутствуют

Информация о видео

23 апреля 2026 г. 8:14:13

00:03:55

AI Research Roundup

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Поделиться

Другие видео канала

NGC: LLMs Learning to Manage Their Own KV Cache

OpenGame: New Framework for Coding Playable Games

DELEGATE-52: Measuring LLM Document Corruption

AeroTransformer: 3D Aerodynamic Prediction

SAW-INT4: 4-Bit KV-Cache Quantization for LLMs

VLA Foundry: Unified Vision-Language-Action Training

Vision Banana: Image Generators as Vision Learners

Sapiens2: 4K High-Fidelity Human Vision Models

COS-PLAY: LLM Skill Discovery for Long Tasks

StyleID: Face Recognition for Stylized Portraits

GSI-Bench: Testing 3D Spatial Logic in MLLMs

WorldMark: Testing Interactive Video World Models

OpenMobile: Synthesis Framework for Mobile Agents

OmniMouse: Scaling Brain Models with 150B Tokens

DeVI: Dexterous Hand Interaction via Video

Volt: SOTA 3D Segmentation with Vanilla Transformers

BLF: SOTA LLM Forecasting via Linguistic Beliefs

One-Shot 3D Avatars with Physics-Based Hair

SGS: Scaling LLM Self-Play via Self-Guidance

LLM Reward Hacking: New Theory and Taxonomy

LLaTiSA: New Hierarchical Time Series Reasoning

Все заметки Новая заметка Страницу в заметки

Страницу в закладки Мои закладки

На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.

О Cookies Напомнить позже Принять