- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Let's put aside Frontier AI Labs and Hyperscalers Cost Effective AI Inference for the Rest of Us M
While most AI infrastructure discussions focus on massive GPU clusters for hyperscalers, enterprise AI will increasingly run as inference of smaller, domain-specific models. Budget-conscious organizations need a different approach.
This talk presents a practical architecture combining Cluster API, HAMi, and Kaito to achieve cost-effective, scalable AI inference. You'll learn how to:
- Use Cluster API to create elastic GPU infrastructure that scales with demand across multiple infrastructure providers
- Apply HAMi's GPU abstraction to maximize utilization on GPU pools and permit heterogeneous hardware choices
- Deploy optimized inference KV-Cache-Aware Load Balancing with LLM-d.
- Use Kaito for simplified Model Lifecycle Management on Kubernetes
- Achieve 60% cost reduction and sub-100ms latencies on 7b models.
Attendees should have basic Kubernetes knowledge; prior AI/ML experience is not required.
We'll show you how such a stack can look like.
Видео Let's put aside Frontier AI Labs and Hyperscalers Cost Effective AI Inference for the Rest of Us M канала KCDCzechSlovak
This talk presents a practical architecture combining Cluster API, HAMi, and Kaito to achieve cost-effective, scalable AI inference. You'll learn how to:
- Use Cluster API to create elastic GPU infrastructure that scales with demand across multiple infrastructure providers
- Apply HAMi's GPU abstraction to maximize utilization on GPU pools and permit heterogeneous hardware choices
- Deploy optimized inference KV-Cache-Aware Load Balancing with LLM-d.
- Use Kaito for simplified Model Lifecycle Management on Kubernetes
- Achieve 60% cost reduction and sub-100ms latencies on 7b models.
Attendees should have basic Kubernetes knowledge; prior AI/ML experience is not required.
We'll show you how such a stack can look like.
Видео Let's put aside Frontier AI Labs and Hyperscalers Cost Effective AI Inference for the Rest of Us M канала KCDCzechSlovak
Комментарии отсутствуют
Информация о видео
5 июня 2026 г. 22:09:12
00:33:09
Другие видео канала






![Kubernetes on a single node - lessons learned [Věroš Kaplan]](https://i.ytimg.com/vi/42lpwaYmBE4/default.jpg)



![DevOps in Wonderland: Machine Learning from operations perspective [Oleksii Kraievyi]](https://i.ytimg.com/vi/GLccIkNRFeI/default.jpg)

![Ahoy Alloy! How Grafana Alloy Can Transform Your Open Telemetry Journey [Daniel Bodky]](https://i.ytimg.com/vi/hOhY72gmt1o/default.jpg)



![OpenSearch: The Open Source Path to Search and Observability [Dotan Horovits]](https://i.ytimg.com/vi/Hy30x7skjH4/default.jpg)
![Play with Kube using Podman [Mario Loriedo]](https://i.ytimg.com/vi/p8tiXaz9sZM/default.jpg)



