Загрузка...

Scaled Dot-Product Attention by hand ✍️

Scaled Dot-Product Attention by hand ✍️ I explained attention in my recent AI seminar as follows:

Step 1 — Compare
We take one token as a query and compare it against all keys using dot products. This gives us a grid of similarity scores: every query against every key.

Step 2 — Scale
We divide those scores by √dₖ to keep the values numerically stable.

Step 3 — Normalize
We apply softmax to turn raw scores into a probability distribution—values between 0 and 1 that sum to 1. These are the attention weights.

Step 4 — Combine
We use those weights to compute a weighted sum of the values. Each output token becomes a linear combination of all previous tokens—maybe 37% from one, 23% from another, and a little from the rest.

No blackbox.
Just dot products, scaling, softmax, and weighted sums—built one row at a time in Excel.

Видео Scaled Dot-Product Attention by hand ✍️ канала AI by Hand
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять