Загрузка страницы

Vision Transformer for Image Classification

Vision Transformer (ViT) is the new state-of-the-art for image classification. ViT was posted on arXiv in Oct 2020 and officially published in 2021. On all the public datasets, ViT beats the best ResNet by a small margin, provided that ViT has been pretrained on a sufficiently large dataset. The bigger the dataset, the greater the advantage of the ViT over ResNet.

Slides: https://github.com/wangshusen/DeepLearning.git

Reference:
- Dosovitskiy et al. An image is worth 16×16 words: transformers for image recognition at scale. In ICLR, 2021.

Видео Vision Transformer for Image Classification канала Shusen Wang
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
5 мая 2021 г. 6:30:13
00:14:47
Яндекс.Метрика