An image is worth 16x16 words: ViT | Is this the extinction of CNNs? Long live the Transformer?
Mom, it's the Transformers again! They have come to ruin my CNN building blocks! 🥺 An Image is Worth 16x16 Words: paper explained.
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to boost our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
📺 Ms. Coffee Bean explains the TRANSFORMER: https://youtu.be/FWFA4DGuzSc
📺 Ms. Coffee Bean on the Multimodal Transformer: https://youtu.be/dd7nE4nbxN0
Outline:
* 00:00 Pure Transformer for vision
* 01:17 How does it work?
* 03:58 The CNN Armageddon?
📄 Paper (not anonymous anymore): "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby
📚 Check out this wonderful post by @JacobGildenblat : https://jacobgil.github.io/deeplearning/vision-transformer-explainability
-----------------------------------
🔗 Links:
YouTube: https://www.youtube.com/AICoffeeBreak
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
#AICoffeeBreak #MsCoffeeBean #ComputerVision #ICLR2021 #MachineLearning #AI #research
Video contains emojis designed by OpenMoji – the open-source emoji and icon project. License: CC BY-SA 4.0
Видео An image is worth 16x16 words: ViT | Is this the extinction of CNNs? Long live the Transformer? канала AI Coffee Break with Letitia
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to boost our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
📺 Ms. Coffee Bean explains the TRANSFORMER: https://youtu.be/FWFA4DGuzSc
📺 Ms. Coffee Bean on the Multimodal Transformer: https://youtu.be/dd7nE4nbxN0
Outline:
* 00:00 Pure Transformer for vision
* 01:17 How does it work?
* 03:58 The CNN Armageddon?
📄 Paper (not anonymous anymore): "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby
📚 Check out this wonderful post by @JacobGildenblat : https://jacobgil.github.io/deeplearning/vision-transformer-explainability
-----------------------------------
🔗 Links:
YouTube: https://www.youtube.com/AICoffeeBreak
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
#AICoffeeBreak #MsCoffeeBean #ComputerVision #ICLR2021 #MachineLearning #AI #research
Video contains emojis designed by OpenMoji – the open-source emoji and icon project. License: CC BY-SA 4.0
Видео An image is worth 16x16 words: ViT | Is this the extinction of CNNs? Long live the Transformer? канала AI Coffee Break with Letitia
Показать
Комментарии отсутствуют
Информация о видео
8 октября 2020 г. 18:59:58
00:05:26
Другие видео канала
Data-efficient Image Transformers EXPLAINED! Facebook AI's DeiT paperSimulating an epidemicThe Transformer neural network architecture EXPLAINED. “Attention is all you need” (NLP)The Big Misconception About ElectricityAn Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)OpenAI's DALL-E explained. How GPT-3 creates images from descriptions.Vision Transformer for Image ClassificationTransformers can do both images and text. Here is why.FNet: Mixing Tokens with Fourier Transforms – Paper ExplainedSwin Transformer paper animated and explainedAn Image Is Worth 16x16 Words - Paper ExplainedTransformer Neural Networks - EXPLAINED! (Attention is all you need)Do Transformers process sequences of FIXED or of VARIABLE length? | #AICoffeeBreakQuizOpenAI’s CLIP explained! | Examples, links to code and pretrained modelConvNeXt: A ConvNet for the 2020s – Paper Explained (with animations)Vision Transformer in PyTorchTransformers in Vision: From Zero to HeroWill Transformers Replace CNNs in Computer Vision? + NVIDIA GTC GiveawayWhat nobody tells you about MULTIMODAL Machine Learning! 🙊 THE definition.