Vision Transformers
GitHub repository: https://github.com/andandandand/practical-computer-vision
0:00 - Vision Transformers
0:14 - Learning goals
0:27 - The Vision Transformer (VIT) architecture
3:38 - Projection of flattened patches and adding positional embeddings
5:32 - The CLS patch / token
7:04 - The Vision Transformer (VIT) architecture
8:17 - The attention component
10:07 - Attention mechanism simplified
12:12 - Interpretability of attention maps
12:44 - The locality bias of convolutional networks
14:09 - The ‘translational equivariance’ bias of convolutional networks
15:12 - Tradeoffs of ViT with convolutional networks (CNNs)
16:09 - Summary
Видео Vision Transformers канала Antonio Rueda-Toicen
0:00 - Vision Transformers
0:14 - Learning goals
0:27 - The Vision Transformer (VIT) architecture
3:38 - Projection of flattened patches and adding positional embeddings
5:32 - The CLS patch / token
7:04 - The Vision Transformer (VIT) architecture
8:17 - The attention component
10:07 - Attention mechanism simplified
12:12 - Interpretability of attention maps
12:44 - The locality bias of convolutional networks
14:09 - The ‘translational equivariance’ bias of convolutional networks
15:12 - Tradeoffs of ViT with convolutional networks (CNNs)
16:09 - Summary
Видео Vision Transformers канала Antonio Rueda-Toicen
Комментарии отсутствуют
Информация о видео
11 апреля 2025 г. 1:42:04
00:17:26
Другие видео канала




















