ConvNeXt: A ConvNet for the 2020s – Paper Explained (with animations)
Can a ConvNet outperform a Vision Transformer? What kind of modifications do we have to apply to a ConvNet to make it as powerful as a Transformer? Spoiler: it’s not attention.
► SPONSOR: Weights & Biases 👉 https://wandb.me/ai-coffee-break
The official ConvNeXt repo has a W&B integration! Also, W&B built the CIFAR10 training colab linked there: 🥳 https://twitter.com/weights_biases/status/1486325233711828996
❓ Check out our daily #MachineLearning Quiz Questions: https://www.youtube.com/c/AICoffeeBreak/community
Explained Paper 📜: Liu, Zhuang, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. “A ConvNet for the 2020s.” arXiv preprint arXiv:2201.03545 (2022). https://arxiv.org/abs/2201.03545
🔗 Tweet of Lukas Beyer (ViT author): https://twitter.com/giffmana/status/1481054929573888005
🔗 Depthwise convolutions image and explanation: https://eli.thegreenplace.net/2018/depthwise-separable-convolutions-for-machine-learning/
Referenced videos:
📺 An image is worth 16x16 words: https://youtu.be/DVoHvmww2lQ
📺 Swin Transformer: https://youtu.be/SndHALawoag
📺 This is how Transformers can process both image and text: https://youtu.be/aH7s6qXEUcc
📺 ViLBERT explained: https://youtu.be/dd7nE4nbxN0
📺 DeiT explained: https://youtu.be/-FbV2KgRM8A
📺 Transformers sequence length: https://youtu.be/Xxts1ithupI
Referenced papers:
📜 “Image Transformer” Paper: Parmar, Niki, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. “Image transformer.” In International Conference on Machine Learning, pp. 4055-4064. PMLR, 2018. https://arxiv.org/abs/1802.05751
📜 “ViLBERT“ paper: Lu, Jiasen, Dhruv Batra, Devi Parikh, and Stefan Lee. “Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks.” arXiv preprint arXiv:1908.02265 (2019). https://arxiv.org/abs/1908.02265
Outline:
00:00 A ConvNet for the 2020s
01:58 Weights & Biases (Sponsor)
03:10 Why bother?
04:40 The perks of ConvNets (CNNs)
06:51 Pros and cons of Transformers
09:54 From ConvNets to ConvNeXts
15:54 Lessons?
Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
donor, Dres. Trost GbR, banana.dev
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔗 Links:
AICoffeeBreakQuiz: https://www.youtube.com/c/AICoffeeBreak/community
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
YouTube: https://www.youtube.com/AICoffeeBreak
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research
Видео ConvNeXt: A ConvNet for the 2020s – Paper Explained (with animations) канала AI Coffee Break with Letitia
► SPONSOR: Weights & Biases 👉 https://wandb.me/ai-coffee-break
The official ConvNeXt repo has a W&B integration! Also, W&B built the CIFAR10 training colab linked there: 🥳 https://twitter.com/weights_biases/status/1486325233711828996
❓ Check out our daily #MachineLearning Quiz Questions: https://www.youtube.com/c/AICoffeeBreak/community
Explained Paper 📜: Liu, Zhuang, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. “A ConvNet for the 2020s.” arXiv preprint arXiv:2201.03545 (2022). https://arxiv.org/abs/2201.03545
🔗 Tweet of Lukas Beyer (ViT author): https://twitter.com/giffmana/status/1481054929573888005
🔗 Depthwise convolutions image and explanation: https://eli.thegreenplace.net/2018/depthwise-separable-convolutions-for-machine-learning/
Referenced videos:
📺 An image is worth 16x16 words: https://youtu.be/DVoHvmww2lQ
📺 Swin Transformer: https://youtu.be/SndHALawoag
📺 This is how Transformers can process both image and text: https://youtu.be/aH7s6qXEUcc
📺 ViLBERT explained: https://youtu.be/dd7nE4nbxN0
📺 DeiT explained: https://youtu.be/-FbV2KgRM8A
📺 Transformers sequence length: https://youtu.be/Xxts1ithupI
Referenced papers:
📜 “Image Transformer” Paper: Parmar, Niki, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. “Image transformer.” In International Conference on Machine Learning, pp. 4055-4064. PMLR, 2018. https://arxiv.org/abs/1802.05751
📜 “ViLBERT“ paper: Lu, Jiasen, Dhruv Batra, Devi Parikh, and Stefan Lee. “Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks.” arXiv preprint arXiv:1908.02265 (2019). https://arxiv.org/abs/1908.02265
Outline:
00:00 A ConvNet for the 2020s
01:58 Weights & Biases (Sponsor)
03:10 Why bother?
04:40 The perks of ConvNets (CNNs)
06:51 Pros and cons of Transformers
09:54 From ConvNets to ConvNeXts
15:54 Lessons?
Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
donor, Dres. Trost GbR, banana.dev
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔗 Links:
AICoffeeBreakQuiz: https://www.youtube.com/c/AICoffeeBreak/community
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
YouTube: https://www.youtube.com/AICoffeeBreak
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research
Видео ConvNeXt: A ConvNet for the 2020s – Paper Explained (with animations) канала AI Coffee Break with Letitia
Показать
Комментарии отсутствуют
Информация о видео
26 января 2022 г. 19:00:11
00:19:20
Другие видео канала
![Adversarial Attacks and Defenses. The Dimpled Manifold Hypothesis. David Stutz from DeepMind #HLF23](https://i.ytimg.com/vi/9bJcfk3HdLY/default.jpg)
![What is LoRA? Low-Rank Adaptation for finetuning LLMs EXPLAINED](https://i.ytimg.com/vi/KEv-F5UkhxU/default.jpg)
![Are ChatBots their own death? | Training on Generated Data Makes Models Forget – Paper explained](https://i.ytimg.com/vi/rrMNWJ9qXlI/default.jpg)
![The first law on AI regulation | The EU AI Act](https://i.ytimg.com/vi/JOKXONV7LuA/default.jpg)
![Say that 3 times in a row. 😅](https://i.ytimg.com/vi/EV8v5P1t84U/default.jpg)
![Author Interviews, Poster Highlights, Summary of the ACL 2023 Toronto NLP](https://i.ytimg.com/vi/-Agcr0nawuk/default.jpg)
![ChatGPT ist not an intelligent agent. It is a cultural technology. – Gopnik Keynote](https://i.ytimg.com/vi/FPqxmkc_qZU/default.jpg)
![Do LLMs understand? Jay Alammar's TLDR of Geoffrey Hinton ACL2023 Keynote](https://i.ytimg.com/vi/BNA0QY79Xhk/default.jpg)
![[Own work] MM-SHAP to measure modality contributions](https://i.ytimg.com/vi/RLaiomLMK9I/default.jpg)
![Eight Things to Know about Large Language Models](https://i.ytimg.com/vi/RX-gGs_EV7M/default.jpg)
![Speaking about AI is hard, even for humans | AI Coffee Break Bloopers](https://i.ytimg.com/vi/w_fmoJz83Cs/default.jpg)
![Moral Self-Correction in Large Language Models | paper explained](https://i.ytimg.com/vi/X_RKCTpuYRA/default.jpg)
![AI beats us at another game: STRATEGO | DeepNash paper explained](https://i.ytimg.com/vi/3vO45gcEbRs/default.jpg)
![Why ChatGPT fails | Language Model Limitations EXPLAINED](https://i.ytimg.com/vi/XstVY5epRWs/default.jpg)
!["Watermarking Language Models" paper and GPTZero EXPLAINED | How to detect text by ChatGPT?](https://i.ytimg.com/vi/-vToUx5SDW4/default.jpg)
![Training learned optimizers: VeLO paper EXPLAINED](https://i.ytimg.com/vi/9a6PQJxzUpM/default.jpg)
![ChatGPT vs Sparrow - Battle of Chatbots](https://i.ytimg.com/vi/SWwQ3k-DWyo/default.jpg)
![Paella: Text to image FASTER than diffusion models | Paella paper explained](https://i.ytimg.com/vi/6zeLSANd41k/default.jpg)
![Generate long form video with Transformers | Phenaki from Google Brain explained](https://i.ytimg.com/vi/RYLomvaPWa4/default.jpg)
![Movie Diffusion explained | Make-a-Video from MetaAI and Imagen Video from Google Brain](https://i.ytimg.com/vi/AcvmyqGgMh8/default.jpg)
![Beyond neural scaling laws – Paper Explained](https://i.ytimg.com/vi/joZaCw5PxYs/default.jpg)