Загрузка страницы

Transformer Model (2/2): Build a Deep Neural Network (1.25x speed recommended)

Next Video: https://youtu.be/EOmd5sUUA_A

The Transformer models are state-of-the-art language models. They are based on attention and dense layer without RNN. In the previous lecture, we have built the attention layer and self-attention layer. In this lecture, we first build multi-head attention layers and then use them to build a deep neural network known as Transformer. Transformer is a Seq2Seq model that can be used for machine translation.
Slides: https://github.com/wangshusen/DeepLearning
Reference:
Vaswani et al. Attention Is All You Need. In NIPS, 2017.

Видео Transformer Model (2/2): Build a Deep Neural Network (1.25x speed recommended) канала Shusen Wang
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
17 апреля 2021 г. 6:08:37
00:23:52
Яндекс.Метрика