Загрузка...

HRM-Text: Efficient Pretraining Beyond Scaling

Paper: HRM-Text: Efficient Pretraining Beyond Scaling (2605.20613)
Published: 20 May 2026.

Learn more on Emergent Mind: https://www.emergentmind.com/papers/2605.20613
arXiv: https://arxiv.org/abs/2605.20613
Sign up for our free trending papers email digest: https://www.emergentmind.com/subscribe
Follow us on X: https://x.com/EmergentMind
Join our Discord: https://discord.gg/BhfTC4mTXq

This presentation explores HRM-Text, a groundbreaking approach to language model pretraining that achieves competitive performance with models 2 to 7 times its size while using up to 432 times less compute and 900 times fewer training tokens. Through a dual-timescale recurrent architecture inspired by biological multi-timescale processing, combined with instruction-response training objectives and novel stabilization techniques, HRM-Text demonstrates that brute-force scaling is not the only path to capable language models. We examine the architectural innovations, training methodology, empirical results, and implications for democratizing large language model research.

Видео HRM-Text: Efficient Pretraining Beyond Scaling канала Emergent Mind

Комментарии отсутствуют

Информация о видео

23 мая 2026 г. 9:41:47

00:02:12

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Поделиться

Другие видео канала

Sub-JEPA: Subspace Gaussian Regularization for Stable End-to-End World Models

Git-Context-Controller: Version-Controlled Agent Memory

AI Must Embrace Specialization via Superhuman Adaptable Intelligence

[DEV] Clawed and Dangerous: Can We Trust Open Agentic Systems?

The Arrow of Time in Operational Formulations of Quantum Theory

Soft-Label Governance for Distributional Safety in Multi-Agent Systems

Wasserstein Spaces: Geometry & Applications

AI Tackles Research-Level Math Autonomously

Drop the Hierarchy and Roles: How Self-Organizing LLM Agents Outperform Designed Structures

Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization

DFlash: Block Diffusion for Flash Speculative Decoding

Multi-Agent Debate: When AI Argues Its Way to Better Answers

Secure Linear Alignment of Large Language Models

Horizon Brightened Acceleration Radiation from Massive Vector Fields

Goldstone Modes: How Physics Unlocks Deep Network Trainability

Chronos: Temporal-Aware Conversational Agents with Structured Event Retrieval for Long-Term Memory

The Kitchen Loop: Self-Evolving Code Through Specification-Driven Verification

DeepSeek-V2: Scaling Intelligence Without Breaking the Bank

GLM-OCR: High-Fidelity Document Understanding at 0.9 Billion Parameters

Ghosts of Softmax: Complex Singularities That Limit Safe Step Sizes in Cross-Entropy

ELT: Elastic Looped Transformers for Visual Generation

Все заметки Новая заметка Страницу в заметки

Страницу в закладки Мои закладки

На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.

О Cookies Напомнить позже Принять