Все видео Новые видео Популярные видео Категории видео

Авто	Видео-блоги	ДТП, аварии	Для маленьких	Еда, напитки
Животные	Закон и право	Знаменитости	Игры	Искусство
Комедии	Красота, мода	Кулинария, рецепты	Люди	Мото
Музыка	Мультфильмы	Наука, технологии	Новости	Образование
Политика	Праздники	Приколы	Природа	Происшествия
Путешествия	Развлечения	Ржач	Семья	Сериалы
Спорт	Стиль жизни	ТВ передачи	Танцы	Технологии
Товары	Ужасы	Фильмы	Шоу-бизнес	Юмор

Vision Transformer (ViT) - An image is worth 16x16 words | Paper Explained

❤️ Become The AI Epiphany Patreon ❤️ ► https://www.patreon.com/theaiepiphany
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

In this video I do a (semi) deep dive of the "An image is worth 16x16 words:
transformers for image recognition at scale" paper which introduced the Vision Transformer.

The paper is very interesting as it showed that with minimal modifications transformers can give better results than CNNs on the image classification problem.

Until now transformers were ruling the NLP world and now they are coming for the CV world as well!

You'll learn about:
✔️ How the vision transformer works
✔️ Main ideas in the paper

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
✅ paper: https://arxiv.org/abs/2010.11929
✅ transformer Jupyter Notebook: https://github.com/gordicaleksa/pytorch-original-transformer/blob/main/The%20Annotated%20Transformer%20%2B%2B.ipynb
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

⌚️ Timetable:
0:00 Enter The Vision Transformer, Jupyter Notebook
1:00 Deep dive intro
3:14 How does Vision Transformer work?
4:39 Let's go even deeper
8:33 Positional encoding inductive bias
9:50 Model variants and results
11:50 VTAB benchmark results
13:00 Perf vs amount of pretrained data
16:08 What does Vision Transformer learn? (attention span)
19:35 Self-supervision vs Supervised learning
21:10 Scaling the transformer (future research prediction)
22:15 Positional Encodings details
24:15 Logging out

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
💰 BECOME A PATREON OF THE AI EPIPHANY ❤️

If these videos, GitHub projects, and blogs help you,
consider helping me out by supporting me on Patreon!

The AI Epiphany ► https://www.patreon.com/theaiepiphany
One-time donation:
https://www.paypal.com/paypalme/theaiepiphany

Much love! ❤️

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

💡 The AI Epiphany is a channel dedicated to simplifying the field of AI using creative visualizations and in general, a stronger focus on geometrical and visual intuition, rather than the algebraic and numerical "intuition".

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
👋 CONNECT WITH ME ON SOCIAL
LinkedIn ► https://www.linkedin.com/in/aleksagordic/
Twitter ► https://twitter.com/gordic_aleksa

Instagram ► https://www.instagram.com/aiepiphany/
Facebook ► https://www.facebook.com/aiepiphany/

👨‍👩‍👧‍👦 JOIN OUR DISCORD COMMUNITY:
Discord ► https://discord.gg/peBrCpheKE

📢 SUBSCRIBE TO MY MONTHLY AI NEWSLETTER:
Substack ► https://aiepiphany.substack.com/

💻 FOLLOW ME ON GITHUB FOR COOL PROJECTS:
GitHub ► https://github.com/gordicaleksa

📚 FOLLOW ME ON MEDIUM:
Medium ► https://gordicaleksa.medium.com/
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

#transformers #computervision #visiontransformer

Видео Vision Transformer (ViT) - An image is worth 16x16 words | Paper Explained канала The AI Epiphany

Показать

Комментарии отсутствуют