Загрузка страницы

Inferencing and training LLMs with less GPUs - Hung Tran

From the January 2024 Machine Learning & AI Meetup: www.meetup.com/machine-learning-ai-meetup/

Talk Description: This talk dives into clever ways to run Large Language Models (LLMs) with fewer GPUs, tackling both inference and training stages. For inference on limited hardware, we'll explore techniques like model partitioning/offloading and quantization to shrink their memory footprint. To slash training costs, we'll delve into ZeRO memory optimization, LoRA, and prompt-tuning, paving the way for more efficient model development. And to cap it off, we'll showcase running LLMs solely on a personal laptop with CPUs and RAM, demonstrating the potential to democratise access to these powerful language models.

Speaker Bio: As an advancing Ph.D. student at Deakin University, specializing in video understanding, Hung Tran is a pioneer in applying deep learning and large language models to complex video data. His research involves processing video data, designing deep learning architectures and training them on multiple GPUs to uncover hidden patterns and predict future events. Recently, he explored how the inductive biases of large language models can enhance video analysis. Beyond research, he is a passionate coder with some experiences in backend web development.

Link to Slides: https://docs.google.com/presentation/d/19EwGnMseSaP4LzTEPMCYZyQtjGIB9P1HPT_FtgK9rUQ/edit?usp=share_link

Видео Inferencing and training LLMs with less GPUs - Hung Tran канала Machine Learning and AI Meetup
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
20 февраля 2024 г. 4:54:10
00:46:14
Яндекс.Метрика