- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm
Full coding of LLaMA 2 from scratch, with full explanation, including Rotary Positional Embedding, RMS Normalization, Multi-Query Attention, KV Cache, Grouped Query Attention (GQA), the SwiGLU Activation function and more!
I explain the most used inference methods: Greedy, Beam Search, Temperature Scaling, Random Sampling, Top K, Top P
I also explain the math behind the Rotary Positional Embedding, with step by step proofs.
Repository with PDF slides: https://github.com/hkproj/pytorch-llama
Download the weights from: https://github.com/facebookresearch/llama
Prerequisites:
1) Transformer explained: https://www.youtube.com/watch?v=bCz4OMemCcA
2) LLaMA explained: https://www.youtube.com/watch?v=Mn_9W1nCFLo
Chapters
00:00:00 - Introduction
00:01:20 - LLaMA Architecture
00:03:14 - Embeddings
00:05:22 - Coding the Transformer
00:19:55 - Rotary Positional Embedding
01:03:50 - RMS Normalization
01:11:13 - Encoder Layer
01:16:50 - Self Attention with KV Cache
01:29:12 - Grouped Query Attention
01:34:14 - Coding the Self Attention
02:01:40 - Feed Forward Layer with SwiGLU
02:08:50 - Model weights loading
02:21:26 - Inference strategies
02:25:15 - Greedy Strategy
02:27:28 - Beam Search
02:31:13 - Temperature
02:32:52 - Random Sampling
02:34:27 - Top K
02:37:03 - Top P
02:38:59 - Coding the Inference
Видео Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm канала Umar Jamil
I explain the most used inference methods: Greedy, Beam Search, Temperature Scaling, Random Sampling, Top K, Top P
I also explain the math behind the Rotary Positional Embedding, with step by step proofs.
Repository with PDF slides: https://github.com/hkproj/pytorch-llama
Download the weights from: https://github.com/facebookresearch/llama
Prerequisites:
1) Transformer explained: https://www.youtube.com/watch?v=bCz4OMemCcA
2) LLaMA explained: https://www.youtube.com/watch?v=Mn_9W1nCFLo
Chapters
00:00:00 - Introduction
00:01:20 - LLaMA Architecture
00:03:14 - Embeddings
00:05:22 - Coding the Transformer
00:19:55 - Rotary Positional Embedding
01:03:50 - RMS Normalization
01:11:13 - Encoder Layer
01:16:50 - Self Attention with KV Cache
01:29:12 - Grouped Query Attention
01:34:14 - Coding the Self Attention
02:01:40 - Feed Forward Layer with SwiGLU
02:08:50 - Model weights loading
02:21:26 - Inference strategies
02:25:15 - Greedy Strategy
02:27:28 - Beam Search
02:31:13 - Temperature
02:32:52 - Random Sampling
02:34:27 - Top K
02:37:03 - Top P
02:38:59 - Coding the Inference
Видео Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm канала Umar Jamil
Комментарии отсутствуют
Информация о видео
3 сентября 2023 г. 8:56:57
03:04:11
Другие видео канала















![BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token](https://i.ytimg.com/vi/90mGPxR2GgY/default.jpg)




