Все видео Новые видео Популярные видео Категории видео

Авто	Видео-блоги	ДТП, аварии	Для маленьких	Еда, напитки
Животные	Закон и право	Знаменитости	Игры	Искусство
Комедии	Красота, мода	Кулинария, рецепты	Люди	Мото
Музыка	Мультфильмы	Наука, технологии	Новости	Образование
Политика	Праздники	Приколы	Природа	Происшествия
Путешествия	Развлечения	Ржач	Семья	Сериалы
Спорт	Стиль жизни	ТВ передачи	Танцы	Технологии
Товары	Ужасы	Фильмы	Шоу-бизнес	Юмор

Lecture 12.2 Transformers

ERRATA: In slide 31, the first part of the transformer block should read
y = self.layernorm(x)
y = self.attention(y)
Also, the code currently suggests that the same layer normalization is applied twice. It is more common to apply different layer normalizations in the same block.

How to take the basic self-attention mechanism and build it up into a Transformer. We discuss The basic transformer block, layer normalization, causal block for autoregressive models and three different ways to encode position information.

annotated slides: https://dlvu.github.io/sa
lecturer: Peter Bloem

Видео Lecture 12.2 Transformers канала DLVU

Показать

Комментарии отсутствуют