Авто	Видео-блоги	ДТП, аварии	Для маленьких	Еда, напитки
Животные	Закон и право	Знаменитости	Игры	Искусство
Комедии	Красота, мода	Кулинария, рецепты	Люди	Мото
Музыка	Мультфильмы	Наука, технологии	Новости	Образование
Политика	Праздники	Приколы	Природа	Происшествия
Путешествия	Развлечения	Ржач	Семья	Сериалы
Спорт	Стиль жизни	ТВ передачи	Танцы	Технологии
Товары	Ужасы	Фильмы	Шоу-бизнес	Юмор

llm-d Demo: Deploy Large Language Models on Kubernetes with Helm and Trace a Request

Watch this demo of llm-d, the open-source solution for running large language models on Kubernetes!

What You'll Learn:

Sample llm-d installation process using Helm charts
How to use the QuickStart CLI for easy all-in-one deployment
Setting up llm-d on different Kubernetes targets (Minikube, OpenShift)
Understanding the llm-d architecture and request flow
How pre-fill and decode disaggregation works
Redis KV cache integration and model service deployment

Key Features Covered:
✅ Quick start installation with dependencies
✅ Hugging Face model integration
✅ Gateway system setup and routing
✅ Inference scheduler and endpoint picker
✅ Pre-fill and decode pod management
✅ Request tracing and architecture overview
Prerequisites:

Kubernetes cluster (Minikube or OpenShift)
Helm installed
Authorized Hugging Face token (for model downloads)

llm-d is open source under Apache 2.0 license and makes deploying LLMs on Kubernetes straightforward with familiar tools and customizable configurations.

Join the llm-d community:
🌎 https://llm-d.ai
💬 https://inviter.co/llm-d-slack
💻 https://github.com/llm-d

Видео llm-d Demo: Deploy Large Language Models on Kubernetes with Helm and Trace a Request канала llm-d Project