- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Inside MAX Serve: From Prompt to Response
MAX serve is Modular's open source inference server. In this interview, AI Performance Engineer Kyle Caverly walks through what happens from the moment a request arrives to the moment text streams back to the client.
You can explore Kyle's diagram here:
https://drive.google.com/file/d/1Zigsjtq37lUqp9_YmUU_T53CIgB-Duz_/view?usp=drive_link
All of the code discussed is open source. Start with the MAX serve repository: https://github.com/modular/modular/tree/main/max/python/max/serve
0:00 Intro
0:35 MAX serve architecture
3:27 API server receives request
5:48 Server creates TextContext object
9:23 Request reaches the model worker
12:25 Construct the batch via TextBatchConstructor
15:48 Prefix caching and chunked prefill
21:23 Pipeline execution
25:24 Consuming completed tokens
27:43 Post-process output and prepare response
29:39 Client receives response
30:51 Multimodality
33:28 Open source code
Видео Inside MAX Serve: From Prompt to Response канала Modular
You can explore Kyle's diagram here:
https://drive.google.com/file/d/1Zigsjtq37lUqp9_YmUU_T53CIgB-Duz_/view?usp=drive_link
All of the code discussed is open source. Start with the MAX serve repository: https://github.com/modular/modular/tree/main/max/python/max/serve
0:00 Intro
0:35 MAX serve architecture
3:27 API server receives request
5:48 Server creates TextContext object
9:23 Request reaches the model worker
12:25 Construct the batch via TextBatchConstructor
15:48 Prefix caching and chunked prefill
21:23 Pipeline execution
25:24 Consuming completed tokens
27:43 Post-process output and prepare response
29:39 Client receives response
30:51 Multimodality
33:28 Open source code
Видео Inside MAX Serve: From Prompt to Response канала Modular
Комментарии отсутствуют
Информация о видео
14 апреля 2026 г. 21:06:10
00:34:38
Другие видео канала




















