- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
How LLMs Understand Text: Tokenization, Encoding, and BPE Explained | LLMs From Scratch Part 1
This is Part 1 of a series on how Large Language Models like GPT, Gemini, and other transformer-based AI systems work under the hood.
In this lecture, we start from the beginning: how raw text becomes something a machine learning model can process. We cover the big picture, data pre-processing, tokenization, encoding, vocabulary design, character vs word tokenization, and Byte-Pair Encoding (BPE) - the sub-word tokenization approach used in many modern language models.
This episode is meant to build the foundation for later parts of the series, where we’ll move into self-supervised training, embeddings, positional encodings, transformer blocks, attention, feedforward networks, loss, back-propagation, fine-tuning, and inference.
If you’re a student, software engineer, or AI enthusiast trying to understand how LLMs actually work beyond the hype, this series is for you.
Reference article: https://decodelm.com/articles/f4c7ea89-46ae-4d8c-825a-78b09e0bd330
Timestamps:
00:00 Introduction
2:20 What are LLMs
4:45 The big picture - Inference and Training
10:40 Training data and pre-processing
13:30 Representing data - Tokens and Vocabulary
16:30 Character tokenization and encoding
21:35 Word tokenization
23:40 Context window
26:50 Character vs word tokenization
32:50 Byte-Pair Encoding (BPE)
36:30 BPE pseudocode and example
47:10 Why BPE
50:15 Conclusion
#LLM #LargeLanguageModels #MachineLearning #ArtificialIntelligence #NLP #Tokenization #BytePairEncoding
Видео How LLMs Understand Text: Tokenization, Encoding, and BPE Explained | LLMs From Scratch Part 1 канала Vishal Sahoo
In this lecture, we start from the beginning: how raw text becomes something a machine learning model can process. We cover the big picture, data pre-processing, tokenization, encoding, vocabulary design, character vs word tokenization, and Byte-Pair Encoding (BPE) - the sub-word tokenization approach used in many modern language models.
This episode is meant to build the foundation for later parts of the series, where we’ll move into self-supervised training, embeddings, positional encodings, transformer blocks, attention, feedforward networks, loss, back-propagation, fine-tuning, and inference.
If you’re a student, software engineer, or AI enthusiast trying to understand how LLMs actually work beyond the hype, this series is for you.
Reference article: https://decodelm.com/articles/f4c7ea89-46ae-4d8c-825a-78b09e0bd330
Timestamps:
00:00 Introduction
2:20 What are LLMs
4:45 The big picture - Inference and Training
10:40 Training data and pre-processing
13:30 Representing data - Tokens and Vocabulary
16:30 Character tokenization and encoding
21:35 Word tokenization
23:40 Context window
26:50 Character vs word tokenization
32:50 Byte-Pair Encoding (BPE)
36:30 BPE pseudocode and example
47:10 Why BPE
50:15 Conclusion
#LLM #LargeLanguageModels #MachineLearning #ArtificialIntelligence #NLP #Tokenization #BytePairEncoding
Видео How LLMs Understand Text: Tokenization, Encoding, and BPE Explained | LLMs From Scratch Part 1 канала Vishal Sahoo
Комментарии отсутствуют
Информация о видео
16 июня 2026 г. 6:45:22
00:53:31
Другие видео канала
