Загрузка...

RTPurbo: 100-Step Sparse Attention for LLMs

In this AI Research Roundup episode, Alex discusses the paper: 'Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps' Long-context large language model inference is severely bottlenecked by the quadratic computational cost of full attention. To solve this, the authors introduce RTPurbo, a highly efficient sparse attention framework that converts full attention models into sparse ones in under one hundred training steps. RTPurbo identifies specialized retrieval heads, projects key-value representations into a lightweight 16-dimensional space, and uses dynamic top-p selection to optimize the active token budget. This methodology avoids expensive native sparse training while delivering an incredibly efficient, low-cost pipeline for long-context LLM decoding. Paper URL: https://arxiv.org/abs/2605.16928 #AI #MachineLearning #DeepLearning #LLM #SparseAttention #RTPurbo #Transformers

Видео RTPurbo: 100-Step Sparse Attention for LLMs канала AI Research Roundup

AI AttentionMechanism DeepLearning DeepLearningResearch LLM LargeLanguageModels LongContext MachineLearning Podcast RTPurbo Research SparseAttention Transformers

Комментарии отсутствуют

Информация о видео

23 мая 2026 г. 6:04:56

00:03:42

AI Research Roundup

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Поделиться

Другие видео канала

SciAtlas: Massive Knowledge Graph for Science

Epicure: Multilingual Ingredient Embeddings

EvalAwareBench: Testing LLM Evaluation Awareness

DVAO: Stabilizing Multi-Reward RL for LLMs

LongAV-Compass: Minute-Long Audio-Video Benchmark

LLM Distillation: Strong Teachers Not Needed

Gamma-World: Scalable Multi-Agent World Model

AXPO: Better Tool Use for Multimodal LLMs

TriSplat: Instant Simulation-Ready 3D Meshes

Concept Hierarchies in LLMs from Word Stats

Macaron-A2UI: Generative UI for LLM Agents

LeJEPA: How JEPAs Learn True World Models

EvalVerse: Benchmarking Cinematic Video Models

SWIM: Fine-Grained Video Object Understanding

LLMs Recognize Their Own Generations

NEO-ov: Encoder-Free Vision-Language Model

MobileGym: Fast Simulation for Mobile GUI Agents

WBench: New Benchmark for Video World Models

LocateAnything: Parallel Box Decoding for VLMs

Sleeping LLMs: Converting KV Cache to SSM Weights

GARD: Robust Multi-view 3D Reconstruction

Все заметки Новая заметка Страницу в заметки

Страницу в закладки Мои закладки

На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.

О Cookies Напомнить позже Принять