Загрузка...

Scalable and Efficient LLM Serving With the VLLM Production Stack - Junchen Jiang & Yue Zhu

Don't miss out! Join us at the next Open Source Summit in Hyderabad, India (August 5); Amsterdam, Netherland (August 25-29); Seoul, South Korea (November 4-5). Join us at the premier vendor-neutral open source conference, where developers and technologists come together to collaborate, share knowledge, and explore the latest innovations and advancements in open source technology. Learn more at https://events.linuxfoundation.org/

Scalable and Efficient LLM Serving With the VLLM Production Stack - Junchen Jiang, University of Chicago & Yue Zhu, IBM Research

Large Language Models (LLMs) are reshaping how we build applications; however, efficiently serving them at scale remains a major challenge.

The vLLM serving engine, historically focused on single-node deployments, is now being extended into a full-stack inference system through our open-source project, **vLLM Production Stack**. This extension enables any organization to deploy vLLM at scale with high reliability, high throughput, and low latency.
Code: https://github.com/vllm-project/production-stack

At a high level, the vLLM Production Stack project allows users to easily deploy to their Kubernetes cluster through a single command. vLLM Production Stack's optimizations include KV cache sharing to speed up inference (https://github.com/LMCache/LMCache), prefix-aware routing that directs inference queries to vLLM instances holding the corresponding KV caches, and robust observability features for monitoring engine status and autoscaling.

Attendees will discover best practices and see real-time demonstrations of how these optimizations work together to enhance LLM inference performance.

Видео Scalable and Efficient LLM Serving With the VLLM Production Stack - Junchen Jiang & Yue Zhu канала The Linux Foundation
Страницу в закладки Мои закладки
Все заметки Новая заметка Страницу в заметки

На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.

Об использовании CookiesПринять