- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Prefill/Decode Disaggregation — AMD ATOM + ATOMesh (ROCm serving)
Prefill/decode disaggregation splits LLM inference into two phases — a compute-bound prefill and a memory-bound decode — and runs each on its own pool of GPUs.
Prefill reads your whole prompt in one parallel, compute-heavy pass; decode then emits one token at a time, bottlenecked by memory bandwidth. On a single worker they collide — a long prefill stalls the decode queue while memory-bound decodes leave the compute units idle. Disaggregation runs each phase on hardware tuned for its bottleneck, handing the KV cache across the interconnect between them. AMD's new ATOM + ATOMesh stack brings this same prefill/decode split, KV-aware scheduling, and OpenAI-compatible API to ROCm and Instinct GPUs.
Full explainer (interactive): https://learnaivisually.com/g/amd-atom-prefill-decode-disaggregation
Source: https://rocm.blogs.amd.com/software-tools-optimization/atomesh-inference/README.html
Learn AI & GPUs visually — free interactive courses at learnaivisually.com
#PrefillDecodeDisaggregation #LLM #AI #AMD
Видео Prefill/Decode Disaggregation — AMD ATOM + ATOMesh (ROCm serving) канала Learn AI Visually
Prefill reads your whole prompt in one parallel, compute-heavy pass; decode then emits one token at a time, bottlenecked by memory bandwidth. On a single worker they collide — a long prefill stalls the decode queue while memory-bound decodes leave the compute units idle. Disaggregation runs each phase on hardware tuned for its bottleneck, handing the KV cache across the interconnect between them. AMD's new ATOM + ATOMesh stack brings this same prefill/decode split, KV-aware scheduling, and OpenAI-compatible API to ROCm and Instinct GPUs.
Full explainer (interactive): https://learnaivisually.com/g/amd-atom-prefill-decode-disaggregation
Source: https://rocm.blogs.amd.com/software-tools-optimization/atomesh-inference/README.html
Learn AI & GPUs visually — free interactive courses at learnaivisually.com
#PrefillDecodeDisaggregation #LLM #AI #AMD
Видео Prefill/Decode Disaggregation — AMD ATOM + ATOMesh (ROCm serving) канала Learn AI Visually
Комментарии отсутствуют
Информация о видео
Вчера, 20:54:15
00:01:01
Другие видео канала





















