Загрузка...

Vision Transformers

GitHub repository: https://github.com/andandandand/practical-computer-vision

0:00 - Vision Transformers
0:14 - Learning goals
0:27 - The Vision Transformer (VIT) architecture
3:38 - Projection of flattened patches and adding positional embeddings
5:32 - The CLS patch / token
7:04 - The Vision Transformer (VIT) architecture
8:17 - The attention component
10:07 - Attention mechanism simplified
12:12 - Interpretability of attention maps
12:44 - The locality bias of convolutional networks
14:09 - The ‘translational equivariance’ bias of convolutional networks
15:12 - Tradeoffs of ViT with convolutional networks (CNNs)
16:09 - Summary

Видео Vision Transformers канала Antonio Rueda-Toicen
Страницу в закладки Мои закладки
Все заметки Новая заметка Страницу в заметки