- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
INT8 finally beats FP8 on consumer GPUs — Fused INT8 GEMM kernel
A fused INT8 GEMM kernel keeps W8A8 matrix multiplies in 8-bit on the GPU's tensor cores, so a quantized model finally runs as fast as INT8 promises.
The usual INT8 kernel quietly converts the weights back to 16-bit before it multiplies, so the fast INT8 tensor cores never switch on — and the model can end up slower than FP8. This fused Triton kernel does the matmul as int8×int8 into int32 on the tensor cores and folds the dequantization into the epilogue, hitting 2.8–4.2× faster per GEMM with no measurable quality loss.
Full explainer (interactive): https://learnaivisually.com/g/fused-int8-gemm-tensor-cores
Source: https://arxiv.org/abs/2606.14598
Learn AI & GPUs visually — free interactive courses at learnaivisually.com
#INT8 #GPU #TensorCores #Quantization #LLM
Видео INT8 finally beats FP8 on consumer GPUs — Fused INT8 GEMM kernel канала Learn AI Visually
The usual INT8 kernel quietly converts the weights back to 16-bit before it multiplies, so the fast INT8 tensor cores never switch on — and the model can end up slower than FP8. This fused Triton kernel does the matmul as int8×int8 into int32 on the tensor cores and folds the dequantization into the epilogue, hitting 2.8–4.2× faster per GEMM with no measurable quality loss.
Full explainer (interactive): https://learnaivisually.com/g/fused-int8-gemm-tensor-cores
Source: https://arxiv.org/abs/2606.14598
Learn AI & GPUs visually — free interactive courses at learnaivisually.com
#INT8 #GPU #TensorCores #Quantization #LLM
Видео INT8 finally beats FP8 on consumer GPUs — Fused INT8 GEMM kernel канала Learn AI Visually
Комментарии отсутствуют
Информация о видео
16 июня 2026 г. 2:40:55
00:01:03
Другие видео канала
