Vision Transformer in PyTorch
In this video I implement the Vision Transformer from scratch. It is very much a clone of the implementation provided in https://github.com/rwightman/pytorch-image-models. I focus solely on the architecture and inference and do not talk about training. I discuss all the relevant concepts that the Vision Transformer is using e.g. patch embedding, attention mechanism, layer normalization and many others.
My implementation: https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/vision_transformer
timm implementation: https://github.com/rwightman/pytorch-image-models
lucidrains implementation: https://github.com/lucidrains/vit-pytorch
00:00 Intro
01:20 Architecture overview
02:53 Patch embedding module
06:39 Attention module
07:22 Dropout overview
08:11 Attention continued 1
10:50 Linear overview
12:10 Attention continued 2
14:35 Multilayer perceptron
16:07 Block module
17:02 LayerNorm overview
19:31 Block continued
20:44 Vision transformer
24:52 Verification
28:01 Cat forward pass
29:10 Outro
If you have any video suggestions or you just wanna chat feel free to join the discord server: https://discord.gg/a8Va9tZsG5
Twitter: https://twitter.com/moverfitted
Credits logo animation
Title: Conjungation · Author: Uncle Milk · Source: https://soundcloud.com/unclemilk · License: https://creativecommons.org/licenses/... · Download (9MB): https://auboutdufil.com/?id=600
Видео Vision Transformer in PyTorch канала mildlyoverfitted
My implementation: https://github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/vision_transformer
timm implementation: https://github.com/rwightman/pytorch-image-models
lucidrains implementation: https://github.com/lucidrains/vit-pytorch
00:00 Intro
01:20 Architecture overview
02:53 Patch embedding module
06:39 Attention module
07:22 Dropout overview
08:11 Attention continued 1
10:50 Linear overview
12:10 Attention continued 2
14:35 Multilayer perceptron
16:07 Block module
17:02 LayerNorm overview
19:31 Block continued
20:44 Vision transformer
24:52 Verification
28:01 Cat forward pass
29:10 Outro
If you have any video suggestions or you just wanna chat feel free to join the discord server: https://discord.gg/a8Va9tZsG5
Twitter: https://twitter.com/moverfitted
Credits logo animation
Title: Conjungation · Author: Uncle Milk · Source: https://soundcloud.com/unclemilk · License: https://creativecommons.org/licenses/... · Download (9MB): https://auboutdufil.com/?id=600
Видео Vision Transformer in PyTorch канала mildlyoverfitted
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)The Sensory Neuron as a Transformer in PyTorchVision Transformer for Image ClassificationVisualizing activations with forward hooks (PyTorch)Integer embeddings in PyTorchCS480/680 Lecture 19: Attention and Transformer NetworksSIREN in PyTorchA brief history of the Transformer architecture in NLPtorch.nn.Embedding explained (+ Character-level language model)Deep learning pipeline in 2021 - Best practicesMultiscale Vision Transformers (MViT) ICCV 2021DINO in PyTorchSwin Transformer paper animated and explainedThe Lottery Ticket Hypothesis and pruning in PyTorchAn image is worth 16x16 words: ViT | Is this the extinction of CNNs? Long live the Transformer?Pytorch Transformers from Scratch (Attention is all you need)Vision Transformer (ViT) - An image is worth 16x16 words | Paper ExplainedMixup in PyTorchVision Transformer ExplainedWhat are Transformer Neural Networks?