Загрузка...

DFlash: Block Diffusion for Flash Speculative Decoding

Paper: DFlash: Block Diffusion for Flash Speculative Decoding (2602.06036)
Published: 5 Feb 2026.

Learn more on Emergent Mind: https://www.emergentmind.com/papers/2602.06036
arXiv: https://arxiv.org/abs/2602.06036
Sign up for our free trending papers email digest: https://www.emergentmind.com/subscribe
Follow us on X: https://x.com/EmergentMind
Join our Discord: https://discord.gg/BhfTC4mTXq

This presentation explores DFlash, a breakthrough speculative decoding framework that uses lightweight block diffusion models to accelerate large language model inference. By generating multiple tokens in parallel rather than sequentially, and conditioning the draft model through direct injection of target model context features, DFlash achieves over 6× speedup compared to standard autoregressive decoding and up to 2.5× improvement over state-of-the-art methods like EAGLE-3, all while maintaining exact generation quality.

Видео DFlash: Block Diffusion for Flash Speculative Decoding канала Emergent Mind
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять