Загрузка...

Let's put aside Frontier AI Labs and Hyperscalers Cost Effective AI Inference for the Rest of Us M

While most AI infrastructure discussions focus on massive GPU clusters for hyperscalers, enterprise AI will increasingly run as inference of smaller, domain-specific models. Budget-conscious organizations need a different approach.

This talk presents a practical architecture combining Cluster API, HAMi, and Kaito to achieve cost-effective, scalable AI inference. You'll learn how to:
- Use Cluster API to create elastic GPU infrastructure that scales with demand across multiple infrastructure providers
- Apply HAMi's GPU abstraction to maximize utilization on GPU pools and permit heterogeneous hardware choices
- Deploy optimized inference KV-Cache-Aware Load Balancing with LLM-d.
- Use Kaito for simplified Model Lifecycle Management on Kubernetes
- Achieve 60% cost reduction and sub-100ms latencies on 7b models.

Attendees should have basic Kubernetes knowledge; prior AI/ML experience is not required.

We'll show you how such a stack can look like.

Видео Let's put aside Frontier AI Labs and Hyperscalers Cost Effective AI Inference for the Rest of Us M канала KCDCzechSlovak
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять