Все видео Новые видео Популярные видео Категории видео

Авто	Видео-блоги	ДТП, аварии	Для маленьких	Еда, напитки
Животные	Закон и право	Знаменитости	Игры	Искусство
Комедии	Красота, мода	Кулинария, рецепты	Люди	Мото
Музыка	Мультфильмы	Наука, технологии	Новости	Образование
Политика	Праздники	Приколы	Природа	Происшествия
Путешествия	Развлечения	Ржач	Семья	Сериалы
Спорт	Стиль жизни	ТВ передачи	Танцы	Технологии
Товары	Ужасы	Фильмы	Шоу-бизнес	Юмор

ConvNeXt: A ConvNet for the 2020s – Paper Explained (with animations)

Can a ConvNet outperform a Vision Transformer? What kind of modifications do we have to apply to a ConvNet to make it as powerful as a Transformer? Spoiler: it’s not attention.
► SPONSOR: Weights & Biases 👉 https://wandb.me/ai-coffee-break

The official ConvNeXt repo has a W&B integration! Also, W&B built the CIFAR10 training colab linked there: 🥳 https://twitter.com/weights_biases/status/1486325233711828996

❓ Check out our daily #MachineLearning Quiz Questions: https://www.youtube.com/c/AICoffeeBreak/community

Explained Paper 📜: Liu, Zhuang, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. “A ConvNet for the 2020s.” arXiv preprint arXiv:2201.03545 (2022). https://arxiv.org/abs/2201.03545
🔗 Tweet of Lukas Beyer (ViT author): https://twitter.com/giffmana/status/1481054929573888005
🔗 Depthwise convolutions image and explanation: https://eli.thegreenplace.net/2018/depthwise-separable-convolutions-for-machine-learning/

Referenced videos:
📺 An image is worth 16x16 words: https://youtu.be/DVoHvmww2lQ
📺 Swin Transformer: https://youtu.be/SndHALawoag
📺 This is how Transformers can process both image and text: https://youtu.be/aH7s6qXEUcc
📺 ViLBERT explained: https://youtu.be/dd7nE4nbxN0
📺 DeiT explained: https://youtu.be/-FbV2KgRM8A
📺 Transformers sequence length: https://youtu.be/Xxts1ithupI

Referenced papers:
📜 “Image Transformer” Paper: Parmar, Niki, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. “Image transformer.” In International Conference on Machine Learning, pp. 4055-4064. PMLR, 2018. https://arxiv.org/abs/1802.05751
📜 “ViLBERT“ paper: Lu, Jiasen, Dhruv Batra, Devi Parikh, and Stefan Lee. “Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks.” arXiv preprint arXiv:1908.02265 (2019). https://arxiv.org/abs/1908.02265

Outline:
00:00 A ConvNet for the 2020s
01:58 Weights & Biases (Sponsor)
03:10 Why bother?
04:40 The perks of ConvNets (CNNs)
06:51 Pros and cons of Transformers
09:54 From ConvNets to ConvNeXts
15:54 Lessons?

Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
donor, Dres. Trost GbR, banana.dev

▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

🔗 Links:
AICoffeeBreakQuiz: https://www.youtube.com/c/AICoffeeBreak/community
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
YouTube: https://www.youtube.com/AICoffeeBreak

#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research

Видео ConvNeXt: A ConvNet for the 2020s – Paper Explained (with animations) канала AI Coffee Break with Letitia

Показать

Комментарии отсутствуют