Загрузка...

Multi-Head Attention — Many Sets of Eyes, Explained | datarekha

One attention pass learns one kind of relationship, but language has many at once. Run attention many times in parallel — each head with its own query/key/value matrices, projecting into a different subspace to look for a different thing (grammar, meaning, long-range links). Concatenate every head's blend and mix with one learned matrix. That diversity is much of why transformers are so powerful. Chapter 61 of the full "ML & DL from scratch, with the math" course (watch the complete ~2h09m film, with all chapters & timestamps in its pinned comment). More at datarekha.com. Narration uses a synthetic AI voice.

Related free lessons on datarekha.com:
- Multi-head attention: https://datarekha.com/deep-learning/multi-head

Видео Multi-Head Attention — Many Sets of Eyes, Explained | datarekha канала datarekha

GATE DA NLP attention mechanism datarekha deep learning language models multi-head attention parallel attention query key value self-attention subspaces transformers

Комментарии отсутствуют

Информация о видео

15 ч. 10 мин. назад

00:01:40

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Поделиться

Другие видео канала

Why LLMs hallucinate — confident, and made up #Shorts

What actually turns an LLM into an agent #shorts

DSA — Dijkstra's Algorithm #Shorts

ROC & AUC — one number for every threshold #Shorts

The Z-Score — Visual Explainer #Shorts

Deep Learning & Generative AI, Intuitively — Single Neuron to ChatGPT (Transformers, LLMs & LLMOps)

The Python You Actually Need for AI & Data Science

AI / LLM Engineer Mock Interview — Easy to Hard | Full Walkthrough

Simpson's paradox — helps each group, hurts overall #Shorts

Push, pull, and origin explained #shorts

Data Engineer Mock Interview — Easy to Hard | Full Walkthrough

SQL — EXPLAIN & Query Plans #Shorts

Turn any messy webpage into clean JSON #shorts

How the shell finds your commands #shorts

Query, Key & Value (QKV) — Visual Explainer #Shorts

Sigmoid — Visual Explainer #Shorts

k-means clustering — find groups with no labels #Shorts

Regularization (L1 & L2) — forcing a model to stay simple #Shorts

The Only Math You Need for AI & Data Science (Linear Algebra, Statistics & Calculus)

Gaussian Process — Visual Explainer #Shorts

Tensor — Visual Explainer #Shorts

Все заметки Новая заметка Страницу в заметки

Страницу в закладки Мои закладки

На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.

О Cookies Напомнить позже Принять