Vision Transformer for Image Classification
Vision Transformer (ViT) is the new state-of-the-art for image classification. ViT was posted on arXiv in Oct 2020 and officially published in 2021. On all the public datasets, ViT beats the best ResNet by a small margin, provided that ViT has been pretrained on a sufficiently large dataset. The bigger the dataset, the greater the advantage of the ViT over ResNet.
Slides: https://github.com/wangshusen/DeepLearning.git
Reference:
- Dosovitskiy et al. An image is worth 16×16 words: transformers for image recognition at scale. In ICLR, 2021.
Видео Vision Transformer for Image Classification канала Shusen Wang
Slides: https://github.com/wangshusen/DeepLearning.git
Reference:
- Dosovitskiy et al. An image is worth 16×16 words: transformers for image recognition at scale. In ICLR, 2021.
Видео Vision Transformer for Image Classification канала Shusen Wang
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
![An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)](https://i.ytimg.com/vi/TrdevFK_am4/default.jpg)
![Transformer Model (1/2): Attention Layers](https://i.ytimg.com/vi/FC8PziPmxnQ/default.jpg)
![Vision Transformer - Keras Code Examples!!](https://i.ytimg.com/vi/i2_zJ0ANrw0/default.jpg)
![Vision Transformer in PyTorch](https://i.ytimg.com/vi/ovB0ddFtzzA/default.jpg)
![Few-Shot Learning (1/3): Basic Concepts](https://i.ytimg.com/vi/hE7eGew4eeg/default.jpg)
![[ML News] OpenAI removes GPT-3 waitlist | GauGAN2 is amazing | NYC regulates AI hiring tools](https://i.ytimg.com/vi/8f5xIMStqF4/default.jpg)
![What are Transformer Neural Networks?](https://i.ytimg.com/vi/XSSTuhyAmnI/default.jpg)
![Vision Transformer (ViT) 用于图片分类](https://i.ytimg.com/vi/BbzOZ9THriY/default.jpg)
![DETR - End to end object detection with transformers (ECCV2020)](https://i.ytimg.com/vi/utxbUlo9CyY/default.jpg)
![An Image Is Worth 16x16 Words - Paper Explained](https://i.ytimg.com/vi/G7i330EJAfU/default.jpg)
![Visual Guide to Transformer Neural Networks - (Episode 1) Position Embeddings](https://i.ytimg.com/vi/dichIcUZfOw/default.jpg)
![DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)](https://i.ytimg.com/vi/h3ij3F3cPIk/default.jpg)
![An image is worth 16x16 words: ViT | Is this the extinction of CNNs? Long live the Transformer?](https://i.ytimg.com/vi/DVoHvmww2lQ/default.jpg)
![Transformer Neural Networks - EXPLAINED! (Attention is all you need)](https://i.ytimg.com/vi/TQQlZhbC5ps/default.jpg)
![Vision Transformer (ViT) - An image is worth 16x16 words | Paper Explained](https://i.ytimg.com/vi/j6kuz_NqkG0/default.jpg)
![Will Transformers Replace CNNs in Computer Vision? + NVIDIA GTC Giveaway](https://i.ytimg.com/vi/QcCJJOLCeJQ/default.jpg)
![[DMQA Open Seminar] Transformer in Computer Vision](https://i.ytimg.com/vi/bgsYOGhpxDc/default.jpg)
![DETR: End-to-End Object Detection with Transformers (Paper Explained)](https://i.ytimg.com/vi/T35ba_VXkMY/default.jpg)
![Visual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention](https://i.ytimg.com/vi/mMa2PmYJlCo/default.jpg)
![Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (paper illustrated)](https://i.ytimg.com/vi/tFYxJZBAbE8/default.jpg)