Загрузка...

The Transformer Block — Attention, Feed-Forward, Residuals & LayerNorm | datarekha

Assemble the full unit. Each block has two main parts — multi-head attention (tokens talk and mix context) and a per-token feed-forward network (it thinks) — each wrapped in a residual connection (gradients flow straight back) and layer normalization (numbers stay well-behaved). Attention, add, normalize; feed-forward, add, normalize. Stack it dozens of times and you have the body of a modern transformer. Chapter 64 of the full "ML & DL from scratch, with the math" course (watch the complete ~2h09m film, with all chapters & timestamps in its pinned comment). More at datarekha.com. Narration uses a synthetic AI voice.

Related free lessons on datarekha.com:
- Inside the transformer block: https://datarekha.com/deep-learning/transformer-block
- The Transformer Architecture: https://datarekha.com/deep-learning/the-transformer

Видео The Transformer Block — Attention, Feed-Forward, Residuals & LayerNorm | datarekha канала datarekha

GATE DA LLM architecture NLP attention datarekha deep learning feed-forward network language models layer normalization residual connections transformer block transformers

Комментарии отсутствуют

Информация о видео

16 ч. 57 мин. назад

00:01:54

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Поделиться

Другие видео канала

Why LLMs hallucinate — confident, and made up #Shorts

What actually turns an LLM into an agent #shorts

DSA — Dijkstra's Algorithm #Shorts

The Z-Score — Visual Explainer #Shorts

Deep Learning & Generative AI, Intuitively — Single Neuron to ChatGPT (Transformers, LLMs & LLMOps)

Simpson's paradox — helps each group, hurts overall #Shorts

Push, pull, and origin explained #shorts

Data Engineer Mock Interview — Easy to Hard | Full Walkthrough

SQL — EXPLAIN & Query Plans #Shorts

Turn any messy webpage into clean JSON #shorts

How the shell finds your commands #shorts

Query, Key & Value (QKV) — Visual Explainer #Shorts

Sigmoid — Visual Explainer #Shorts

k-means clustering — find groups with no labels #Shorts

Regularization (L1 & L2) — forcing a model to stay simple #Shorts

Gaussian Process — Visual Explainer #Shorts

Tensor — Visual Explainer #Shorts

Star vs Snowflake schema #Shorts

Principal Component Analysis, Visually — PCA & the Math From Scratch | datarekha

Reranking — Visual Explainer #Shorts

Log-Likelihood & MLE — Visual Explainer #Shorts

Все заметки Новая заметка Страницу в заметки

Страницу в закладки Мои закладки

На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.

О Cookies Напомнить позже Принять