Загрузка...

ScheduleFree+: Learning-Rate-Free LLM Training

In this AI Research Roundup episode, Alex discusses the paper: 'ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models' Schedule-Free Learning has proven to be a practical anytime training method across standard benchmarks, but its effectiveness for large language models (LLMs) was previously limited to small scales. In this paper, researchers from Meta's FAIR Super-Intelligence Labs introduce ScheduleFree+, which scales learning-rate-free and schedule-free learning to larger batch and model sizes. This new method significantly outperforms traditional Warmup-Stable-Decay (WSD) schedules, especially during long-duration pretraining. In fact, at 1000 tokens per parameter, ScheduleFree+ outperforms state-of-the-art schedules by 31%. Additionally, this approach provides a solid theoretical foundation for model averaging and checkpoint merging during pretraining. Paper URL: https://arxiv.org/pdf/2605.19095 #AI #MachineLearning #DeepLearning #LLMs #Optimization #MetaFAIR

Видео ScheduleFree+: Learning-Rate-Free LLM Training канала AI Research Roundup

AI Research AI Research Roundup Checkpoint Merging Deep Learning LLM Training Large Language Models Learning Rate Machine Learning Meta FAIR Model Averaging Optimization Optimization Algorithms ScheduleFree

Комментарии отсутствуют

Информация о видео

1 ч. 59 мин. назад

00:04:19

AI Research Roundup

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Поделиться

Другие видео канала

PhysX-Omni: Simulation-Ready 3D Generation

PEEK: New Orientation Cache for LLM Agents

CODA: Faster Transformer Training on GPUs

Gated DeltaNet-2: Decoupling Erase & Write

Stable Audio 3: Fast Audio Generation & Editing

ThoughtTrace: Mapping User Thoughts in LLM Chat

VPO: RL for Diverse LLM Test-Time Search

SPD: Boosting LLMs via Self-Distillation

MINTEval: Evaluating LLM Memory Interference

RTPurbo: 100-Step Sparse Attention for LLMs

SEGA: Training-Free High-Res DiT Generation

DelTA: Precise Token Credit for LLM RLVR

SkillOpt: Optimizer for LLM Agent Skills

AVSD: Multi-View Self-Distillation for LLMs

TransitLM: LLM Dataset for Map-Free Transit

A Design Science for LLM Agent Evaluation

ConvexTok: Optimal Tokenisation for LLMs

DAR: Dynamic Routing for Diffusion Transformers

HRM-Text: Ultra-Efficient LLM Pretraining

Persistent 3D Memory for Curious RL Agents

Все заметки Новая заметка Страницу в заметки

Страницу в закладки Мои закладки

На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.

О Cookies Напомнить позже Принять