- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Why LLM Inference Is Memory-Bound, Not Compute-Bound
The limiting factor in LLM inference isn't compute. It's how fast you can move weights from DRAM to the chip.
In this interview, CTO Mathias Lechner speaks with Piotr Mazurek from Liquid AI's inference team about what's actually happening when an LLM handles a request: the prefill/decode distinction, multi-GPU parallelism strategies, and how to choose between inference frameworks like vLLM, SGLang, and TensorRT-LLM depending on latency and throughput requirements.
Liquid AI builds foundation models designed for efficiency and performance across a range of deployment contexts. This series features Mathias in conversation with researchers and engineers across the company.
Subscribe to follow every episode: https://www.youtube.com/@liquid-ai-inc
Careers at Liquid AI: https://www.liquid.ai/careers
Видео Why LLM Inference Is Memory-Bound, Not Compute-Bound канала Liquid AI
In this interview, CTO Mathias Lechner speaks with Piotr Mazurek from Liquid AI's inference team about what's actually happening when an LLM handles a request: the prefill/decode distinction, multi-GPU parallelism strategies, and how to choose between inference frameworks like vLLM, SGLang, and TensorRT-LLM depending on latency and throughput requirements.
Liquid AI builds foundation models designed for efficiency and performance across a range of deployment contexts. This series features Mathias in conversation with researchers and engineers across the company.
Subscribe to follow every episode: https://www.youtube.com/@liquid-ai-inc
Careers at Liquid AI: https://www.liquid.ai/careers
Видео Why LLM Inference Is Memory-Bound, Not Compute-Bound канала Liquid AI
Комментарии отсутствуют
Информация о видео
27 мая 2026 г. 19:27:53
00:04:48
Другие видео канала




















