Загрузка...

Subquadratic Sparse Attention, explained #Shorts

Subquadratic sparse attention is attention whose compute grows near-linearly with context length instead of with its square (n²), by having each token attend to only a small, learned set of other tokens rather than all of them.

Dense attention makes every token compare itself to every other, so cost scales with n² — the wall that strands most models near a million tokens. SubQ 1.1 Small (Subquadratic Inc.) uses Subquadratic Sparse Attention (SSA) to reach a 12-million-token context, with a reported 64.5× less compute than dense attention and 56× faster than FlashAttention-2 at 1M tokens.

Full explainer (interactive): https://learnaivisually.com/g/subq-1-1-subquadratic-sparse-attention
Source: https://subq.ai/subq-1-1-small-technical-report

Learn AI & GPUs visually — free interactive courses at learnaivisually.com

#SubquadraticAttention #LLM #AI #SubQ

#Shorts

Видео Subquadratic Sparse Attention, explained #Shorts канала Learn AI Visually
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять