- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Scaling Intelligence Through the Memory Hierarchy with Solidigm
Solidigm's presentation at AI Field Day 8, led by Kapil Karkra, highlighted memory capacity as a critical, often overlooked, third axis for scaling AI intelligence, alongside model size and compute power. Solidigm introduced its "CRAFT" framework to define and measure AI intelligence across five dimensions: Comprehension, Recall, Adaptability, Fluency, and Tenacity. The core argument is that expanding memory capacity beyond the GPU's high-bandwidth memory (HBM) to system DRAM and NVMe SSDs dramatically improves AI performance and quality by enabling more efficient inference and preventing costly recomputations.
Through various benchmarks and experiments, Solidigm demonstrated the impact of memory capacity on each CRAFT dimension. For Recall, offloading Key-Value (KV) cache to SSDs prevented the GPU from recomputing previous states, significantly boosting throughput. Tenacity was illustrated with an AIME 2024 math test, where increased output token capacity allowed the model to deliberate longer and achieve a higher score, showcasing how more "scratch space" leads to better reasoning quality. Adaptability, measured by requests per second, and Fluency, indicated by inter-token latency, both saw substantial improvements (up to 4x throughput and 21x better latency) when NVMe SSDs extended the KV cache, allowing the system to handle more concurrent requests without compromising responsiveness. Similarly, Comprehension, tested with a "needle in a haystack" benchmark, showed 78 times faster reading when context fit in the extended cache.
The presentation concluded that while higher bandwidth storage is beneficial when working sets fit within faster tiers, ultimately, sheer capacity becomes paramount for larger, more complex AI workloads involving multiple agents and extensive context lengths. The discussion emphasized the need for a tiered memory hierarchy, where automatic caching across HBM, DRAM, and NVMe SSDs optimizes resource utilization and avoids GPU stalls. This approach allows organizations to balance performance and cost effectively, ensuring that AI systems can sustain deeper reasoning, handle greater concurrency, and deliver higher quality, more fluent responses by leveraging expanded memory capacity.
Presented by Kapil Karkra, Sr. Principal Engineer AI Solutions and Software, Solidigm. Recorded live at AI Field Day 8 in San Jose, California on May 14, 2026. Watch the entire presentation at https://techfieldday.com/appearance/solidigm-presents-at-ai-field-day-8/or visit https://TechFieldDay.com/event/aifd8/ or https://Solidigm.com for more information.
Видео Scaling Intelligence Through the Memory Hierarchy with Solidigm канала Tech Field Day
Through various benchmarks and experiments, Solidigm demonstrated the impact of memory capacity on each CRAFT dimension. For Recall, offloading Key-Value (KV) cache to SSDs prevented the GPU from recomputing previous states, significantly boosting throughput. Tenacity was illustrated with an AIME 2024 math test, where increased output token capacity allowed the model to deliberate longer and achieve a higher score, showcasing how more "scratch space" leads to better reasoning quality. Adaptability, measured by requests per second, and Fluency, indicated by inter-token latency, both saw substantial improvements (up to 4x throughput and 21x better latency) when NVMe SSDs extended the KV cache, allowing the system to handle more concurrent requests without compromising responsiveness. Similarly, Comprehension, tested with a "needle in a haystack" benchmark, showed 78 times faster reading when context fit in the extended cache.
The presentation concluded that while higher bandwidth storage is beneficial when working sets fit within faster tiers, ultimately, sheer capacity becomes paramount for larger, more complex AI workloads involving multiple agents and extensive context lengths. The discussion emphasized the need for a tiered memory hierarchy, where automatic caching across HBM, DRAM, and NVMe SSDs optimizes resource utilization and avoids GPU stalls. This approach allows organizations to balance performance and cost effectively, ensuring that AI systems can sustain deeper reasoning, handle greater concurrency, and deliver higher quality, more fluent responses by leveraging expanded memory capacity.
Presented by Kapil Karkra, Sr. Principal Engineer AI Solutions and Software, Solidigm. Recorded live at AI Field Day 8 in San Jose, California on May 14, 2026. Watch the entire presentation at https://techfieldday.com/appearance/solidigm-presents-at-ai-field-day-8/or visit https://TechFieldDay.com/event/aifd8/ or https://Solidigm.com for more information.
Видео Scaling Intelligence Through the Memory Hierarchy with Solidigm канала Tech Field Day
Комментарии отсутствуют
Информация о видео
21 мая 2026 г. 18:18:00
01:01:08
Другие видео канала







