- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Scaled Dot-Product Attention by hand ✍️
Scaled Dot-Product Attention by hand ✍️ I explained attention in my recent AI seminar as follows:
Step 1 — Compare
We take one token as a query and compare it against all keys using dot products. This gives us a grid of similarity scores: every query against every key.
Step 2 — Scale
We divide those scores by √dₖ to keep the values numerically stable.
Step 3 — Normalize
We apply softmax to turn raw scores into a probability distribution—values between 0 and 1 that sum to 1. These are the attention weights.
Step 4 — Combine
We use those weights to compute a weighted sum of the values. Each output token becomes a linear combination of all previous tokens—maybe 37% from one, 23% from another, and a little from the rest.
No blackbox.
Just dot products, scaling, softmax, and weighted sums—built one row at a time in Excel.
Видео Scaled Dot-Product Attention by hand ✍️ канала AI by Hand
Step 1 — Compare
We take one token as a query and compare it against all keys using dot products. This gives us a grid of similarity scores: every query against every key.
Step 2 — Scale
We divide those scores by √dₖ to keep the values numerically stable.
Step 3 — Normalize
We apply softmax to turn raw scores into a probability distribution—values between 0 and 1 that sum to 1. These are the attention weights.
Step 4 — Combine
We use those weights to compute a weighted sum of the values. Each output token becomes a linear combination of all previous tokens—maybe 37% from one, 23% from another, and a little from the rest.
No blackbox.
Just dot products, scaling, softmax, and weighted sums—built one row at a time in Excel.
Видео Scaled Dot-Product Attention by hand ✍️ канала AI by Hand
Комментарии отсутствуют
Информация о видео
4 января 2026 г. 7:27:07
00:00:30
Другие видео канала




















