Adding vs. concatenating positional embeddings & Learned positional encodings
When to add and when to concatenate positional embeddings? What are arguments for learning positional encodings? When to hand-craft them? Ms. Coffee Bean’s answers these questions in this video.
Outline:
00:00 Concatenated vs. added positional embeddings
04:49 Learned positional embeddings
06:48 Ms. Coffee Bean deepest insight ever
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, help us boost our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
📺 Positional embeddings explained: https://youtu.be/1biZfFLPRSY
📺 Fourier Transform instead of attention: https://youtu.be/j7pWPdGEfMA
📺 Transformer explained: https://youtu.be/FWFA4DGuzSc
Papers 📄:
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." In Advances in neural information processing systems, pp. 5998-6008. 2017. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Wang, Yu-An, and Yun-Nung Chen. "What do position embeddings learn? an empirical study of pre-trained language model positional encoding." arXiv preprint arXiv:2010.04903 (2020). https://arxiv.org/pdf/2010.04903.pdf
Dosovitskiy, Alexey, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020). https://arxiv.org/abs/2010.11929
🔗 Links:
AICoffeeBreakQuiz: https://www.youtube.com/c/AICoffeeBreak/community
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
YouTube: https://www.youtube.com/AICoffeeBreak
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research
Видео Adding vs. concatenating positional embeddings & Learned positional encodings канала AI Coffee Break with Letitia
Outline:
00:00 Concatenated vs. added positional embeddings
04:49 Learned positional embeddings
06:48 Ms. Coffee Bean deepest insight ever
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, help us boost our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
📺 Positional embeddings explained: https://youtu.be/1biZfFLPRSY
📺 Fourier Transform instead of attention: https://youtu.be/j7pWPdGEfMA
📺 Transformer explained: https://youtu.be/FWFA4DGuzSc
Papers 📄:
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." In Advances in neural information processing systems, pp. 5998-6008. 2017. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Wang, Yu-An, and Yun-Nung Chen. "What do position embeddings learn? an empirical study of pre-trained language model positional encoding." arXiv preprint arXiv:2010.04903 (2020). https://arxiv.org/pdf/2010.04903.pdf
Dosovitskiy, Alexey, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020). https://arxiv.org/abs/2010.11929
🔗 Links:
AICoffeeBreakQuiz: https://www.youtube.com/c/AICoffeeBreak/community
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
YouTube: https://www.youtube.com/AICoffeeBreak
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research
Видео Adding vs. concatenating positional embeddings & Learned positional encodings канала AI Coffee Break with Letitia
Показать
Комментарии отсутствуют
Информация о видео
18 июля 2021 г. 16:45:01
00:09:21
Другие видео канала
Self-Attention with Relative Position Representations – Paper explainedPositional embeddings in transformers EXPLAINED | Demystifying positional encodings.FNet: Mixing Tokens with Fourier Transforms – Paper ExplainedOpenAI Embeddings (and Controversy?!)All Espresso Drinks Explained: Cappuccino vs Latte vs Flat White and more!The Transformer neural network architecture EXPLAINED. “Attention is all you need” (NLP)An image is worth 16x16 words: ViT | Is this the extinction of CNNs? Long live the Transformer?Visual Guide to Transformer Neural Networks - (Episode 1) Position EmbeddingsThe Dark Side of Discord (And best Solutions and Alternatives!)UMAP explained | The best dimensionality reduction?GPT-3 explained with examples. Possibilities, and implications.torch.nn.TransformerEncoderLayer - Part 1 - Transformer Embedding and Position Encoding LayerConvNeXt: A ConvNet for the 2020s – Paper Explained (with animations)But what is the Fourier Transform? A visual introduction.What is tokenization? How does it work? Tokenizers explained.Data leakage during data preparation? | Using AntiPatterns to avoid MLOps MistakesOpenAI’s CLIP explained! | Examples, links to code and pretrained modelThe ultimate intro to Graph Neural Networks. Maybe.Word2Vec Simplified|Word2Vec explained in simple language|CBOW and Skipgrm methods in word2vec