Загрузка...

Multi-Node Finetuning LLMs on Kubernetes: A Practitioner’s Guide - Ashish Kamra & Boaz Ben Shabat

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon Europe in London from April 1 - 4, 2025. Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at https://kubecon.io

Multi-Node Finetuning LLMs on Kubernetes: A Practitioner’s Guide - Ashish Kamra & Boaz Ben Shabat, Red Hat

Large Language Model (LLM) finetuning on enterprise private data has emerged as an important strategy for enhancing model performance on specific downstream tasks. This process however demands substantial compute resources, and presents some unique challenges in Kubernetes environments. This session offers a practical, step-by-step guide to implementing multi-node LLM finetuning on Kubernetes clusters with GPUs, utilizing PyTorch FSDP and the Kubeflow training operator. We'll cover - preparing a Kubernetes cluster for LLM finetuning, optimizing cluster, system, and network configurations, and comparing performance of various network topologies including pod networking, secondary networks, and GPU Direct RDMA over ethernet for peak performance. By the end of this session, the audience will have a comprehensive understanding of the intricacies involved in multi-node LLM finetuning on Kubernetes empowering them to introduce the same in their own production Kubernetes environments.

Видео Multi-Node Finetuning LLMs on Kubernetes: A Practitioner’s Guide - Ashish Kamra & Boaz Ben Shabat канала CNCF [Cloud Native Computing Foundation]
Яндекс.Метрика

На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.

Об использовании CookiesПринять