Transformer Model (2/2): Build a Deep Neural Network (1.25x speed recommended)
Next Video: https://youtu.be/EOmd5sUUA_A
The Transformer models are state-of-the-art language models. They are based on attention and dense layer without RNN. In the previous lecture, we have built the attention layer and self-attention layer. In this lecture, we first build multi-head attention layers and then use them to build a deep neural network known as Transformer. Transformer is a Seq2Seq model that can be used for machine translation.
Slides: https://github.com/wangshusen/DeepLearning
Reference:
Vaswani et al. Attention Is All You Need. In NIPS, 2017.
Видео Transformer Model (2/2): Build a Deep Neural Network (1.25x speed recommended) канала Shusen Wang
The Transformer models are state-of-the-art language models. They are based on attention and dense layer without RNN. In the previous lecture, we have built the attention layer and self-attention layer. In this lecture, we first build multi-head attention layers and then use them to build a deep neural network known as Transformer. Transformer is a Seq2Seq model that can be used for machine translation.
Slides: https://github.com/wangshusen/DeepLearning
Reference:
Vaswani et al. Attention Is All You Need. In NIPS, 2017.
Видео Transformer Model (2/2): Build a Deep Neural Network (1.25x speed recommended) канала Shusen Wang
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
Transformer Model (1/2): Attention LayersVision Transformer for Image ClassificationBERT for pretraining TransformersTransformer Neural Networks - EXPLAINED! (Attention is all you need)Attention for RNN Seq2Seq Models (1.25x speed recommended)CS480/680 Lecture 19: Attention and Transformer NetworksCYMATICS: Science Vs. Music - Nigel StanfordRecurrent Neural Networks (LSTM / RNN) Implementation with Keras - PythonIllustrated Guide to LSTM's and GRU's: A step by step explanationIllustrated Guide to Transformers Neural Network: A step by step explanationPytorch Transformers from Scratch (Attention is all you need)Transformers in Vision: From Zero to HeroSwin Transformer: Hierarchical Vision Transformer using Shifted Windows (paper illustrated)MarI/O - Machine Learning for Video GamesGoogle Data Center Security: 6 Layers DeepAttention in Neural NetworksTransformers vs Recurrent Neural Networks (RNN)!Swin Transformer paper animated and explainedVisual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention