Let's put aside Frontier AI Labs and Hyperscalers Cost Effective AI Inference for the Rest of Us M

While most AI infrastructure discussions focus on massive GPU clusters for hyperscalers, enterprise AI will increasingly run as inference of smaller, domain-specific models. Budget-conscious organizations need a different approach.

This talk presents a practical architecture combining Cluster API, HAMi, and Kaito to achieve cost-effective, scalable AI inference. You'll learn how to:
- Use Cluster API to create elastic GPU infrastructure that scales with demand across multiple infrastructure providers
- Apply HAMi's GPU abstraction to maximize utilization on GPU pools and permit heterogeneous hardware choices
- Deploy optimized inference KV-Cache-Aware Load Balancing with LLM-d.
- Use Kaito for simplified Model Lifecycle Management on Kubernetes
- Achieve 60% cost reduction and sub-100ms latencies on 7b models.

Attendees should have basic Kubernetes knowledge; prior AI/ML experience is not required.

We'll show you how such a stack can look like.

Видео Let's put aside Frontier AI Labs and Hyperscalers Cost Effective AI Inference for the Rest of Us M канала KCDCzechSlovak

Комментарии отсутствуют

Информация о видео

5 июня 2026 г. 22:09:12

00:33:09

KCDCzechSlovak

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Другие видео канала

Let's put aside Frontier AI Labs and Hyperscalers Cost Effective AI Inference for the Rest of Us M

Otvorenie 2. dňa konferencie KCD Czech&Slovak 2023 v priestoroch FIIT STU Bratislava a rekapitulácia

First Day in the Big Room of Kubernetes Community Days CZ & SK Bratislava 2023

Double the Efficiency: Kubernetes Autoscaling with Karpenter and KEDA-Christian Melendez;Jan Wozniak

KCD Czech & Slovak 2026 | Day 2 – Room 107 (Live Stream) ☸️

Terra who Is Pulumi the New King of IaC - Martin Dulák

Kubernetes on a single node - lessons learned [Věroš Kaplan]

Argo CD: Declared and generated applications - Milan Skuhra

Let's get meshy! Microservices are easy with Event Mesh - Chris Suszynski

Discovering Atlantis: IaC automation workflow - Jakub Stehlik

DevOps in Wonderland: Machine Learning from operations perspective [Oleksii Kraievyi]

Microfrontends with Kubernetes: Microservice-Level Agility on Frontend-Milan Unger, Michal Ševčík

Ahoy Alloy! How Grafana Alloy Can Transform Your Open Telemetry Journey [Daniel Bodky]

First Day in Small Room of Kubernetes Community Days CZ & SK Bratislava 2023

25. CNCF SK meetup || Kubernetes Day 2 @ PIXEL FEDERATION || Jozef Halgas

DAY 1 | ROOM 155 | Welcome to the live stream of the First Day of KCD 2024!

OpenSearch: The Open Source Path to Search and Observability [Dotan Horovits]

Play with Kube using Podman [Mario Loriedo]

Summary of the Day 1 KCD Czech and Slovak 2024

Journey Towards Kubernetes From FTP based deployment to K8s GitOps - David Pech, Lukáš Šabľa

27 04 management Ondrej Sika Kubernetes 101 Z nuly k aplikaci bezici v Kubernetes lokalne

Flowing through failures durable workflows in k8s Dominik Hanák