- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization
The key-value (KV) cache dominates memory bandwidth and footprint in long-context autoregressive inference. Recent rotation-preconditioned codecs (TurboQuant, PolarQuant) show that a structured random rotation followed by a per-coordinate scalar quantizer matched to an analytically tractable marginal is a near-optimal recipe for KV compression. OCTOPUS advances this paradigm through joint quantization of rotated coordinate triplets. Each triplet's direction is mapped to a square via an octahedral parameterization, and the two resulting coordinates and the triplet norm are Lloyd-Max quantized against implementation-matched marginals. Optimizing the per-triplet squared error gives a strictly non-uniform bit allocation depending only on the total dimensionality of the keys. We find the finite-dimensional quality optimum with sweeps to be constant on every real decoder we test. The codec is data-oblivious, online, and deterministic given a seed. Across text, video, and audio, OCTOPUS matches or beats every prior rotation codec at every reported bit width and metric, with a lead that grows as bits drop for extreme compression. Furthermore, a fused Triton implementation reconstructs keys on the fly without materializing the uncompressed key, so the codec adds no decode-time bandwidth or latency over the existing dequantization.
Project Page: https://octopus-quant.github.io/
Paper: https://octopus-quant.github.io/static/paper.pdf
Видео OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization канала Mark Boss
Project Page: https://octopus-quant.github.io/
Paper: https://octopus-quant.github.io/static/paper.pdf
Видео OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization канала Mark Boss
Комментарии отсутствуют
Информация о видео
21 мая 2026 г. 22:23:58
00:04:03
Другие видео канала

![[CVPR20] Two-shot Spatially-varying BRDF and Shape Estimation](https://i.ytimg.com/vi/CyC6PutoJO8/default.jpg)
![[ICCV21] NeRD: Neural Reflectance Decomposition from Image Collections](https://i.ytimg.com/vi/IM9OgMwHNTI/default.jpg)

![[NeurIPS 2021] Neural-PIL: Neural Pre-Integrated Lighting for Reflectance Decomposition](https://i.ytimg.com/vi/p5cKaNwVp4M/default.jpg)

