Are Pre-trained Convolutions Better than Pre-trained Transformers? – Paper Explained
Paper explained and visualized “Are Pre-trained Convolutions Better than Pre-trained Transformers?” Tune in to the epic fight of CNNs against transformers! Or at least, that’s how the paper is framed.
Because Ms. Coffee Bean also wonders what it takes for a transformer(-like) architecture to be named transformer and when does it become something else, e.g. a CNN. Join the comment section to discuss!
➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to boost our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Referenced videos:
📺 Self-attention replaced with the Fourier Transform: https://youtu.be/j7pWPdGEfMA
📺 Ms. Coffee Bean explains the Transformer: https://youtu.be/FWFA4DGuzSc
Discussed paper:
📄 Tay, Y., Dehghani, M., Gupta, J., Bahri, D., Aribandi, V., Qin, Z., & Metzler, D. (2021). Are Pre-trained Convolutions Better than Pre-trained Transformers? https://arxiv.org/abs/2105.03322
Outline:
* 00:00 Are you tired of transformers?
* 01:12 What makes transformers so good?
* 05:13 CNN vs. Transformers
* 09:53 What makes a transformer a transformer? -- Discussion
Music 🎵 : Savior Search - DJ Freedem
🔗 Links:
YouTube: https://www.youtube.com/AICoffeeBreak
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research
Видео Are Pre-trained Convolutions Better than Pre-trained Transformers? – Paper Explained канала AI Coffee Break with Letitia
Because Ms. Coffee Bean also wonders what it takes for a transformer(-like) architecture to be named transformer and when does it become something else, e.g. a CNN. Join the comment section to discuss!
➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to boost our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Referenced videos:
📺 Self-attention replaced with the Fourier Transform: https://youtu.be/j7pWPdGEfMA
📺 Ms. Coffee Bean explains the Transformer: https://youtu.be/FWFA4DGuzSc
Discussed paper:
📄 Tay, Y., Dehghani, M., Gupta, J., Bahri, D., Aribandi, V., Qin, Z., & Metzler, D. (2021). Are Pre-trained Convolutions Better than Pre-trained Transformers? https://arxiv.org/abs/2105.03322
Outline:
* 00:00 Are you tired of transformers?
* 01:12 What makes transformers so good?
* 05:13 CNN vs. Transformers
* 09:53 What makes a transformer a transformer? -- Discussion
Music 🎵 : Savior Search - DJ Freedem
🔗 Links:
YouTube: https://www.youtube.com/AICoffeeBreak
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research
Видео Are Pre-trained Convolutions Better than Pre-trained Transformers? – Paper Explained канала AI Coffee Break with Letitia
Показать
Комментарии отсутствуют
Информация о видео
26 мая 2021 г. 18:45:01
00:12:02
Другие видео канала
![Deep Learning for Symbolic Mathematics!? | Paper EXPLAINED](https://i.ytimg.com/vi/l7ofrfmVsd0/default.jpg)
![[Quiz] Eigenfaces, Domain adaptation, Causality, Manifold Hypothesis, Denoising Autoencoder](https://i.ytimg.com/vi/yPXNQ6Ig7hQ/default.jpg)
![[RANT] Adversarial attack on OpenAI’s CLIP? Are we the fools or the foolers?](https://i.ytimg.com/vi/xqdHfLrevuo/default.jpg)
![Our paper at CVPR 2020 - MUL Workshop and ACL 2020 - ALVR Workshop](https://i.ytimg.com/vi/Ev17hz52FGo/default.jpg)
![Data BAD | What Will it Take to Fix Benchmarking for NLU?](https://i.ytimg.com/vi/W57u1j16iC8/default.jpg)
![Preparing for Virtual Conferences – 7 Tips for recording a good conference talk](https://i.ytimg.com/vi/b6Gad5edd18/default.jpg)
![Can a neural network tell if an image is mirrored? – Visual Chirality](https://i.ytimg.com/vi/rbg1Mdo2LZM/default.jpg)
![AI Coffee Break - Channel Trailer](https://i.ytimg.com/vi/h9xPrgTYP_0/default.jpg)
![[Quiz] Interpretable ML, VQ-VAE w/o Quantization / infinite codebook, Pearson’s, PointClouds](https://i.ytimg.com/vi/KS7UlN9SCg4/default.jpg)
![What is the model identifiability problem? | Explained in 60 seconds! | ❓ #AICoffeeBreakQuiz #Shorts](https://i.ytimg.com/vi/mI4sXRSkzE8/default.jpg)
![Adding vs. concatenating positional embeddings & Learned positional encodings](https://i.ytimg.com/vi/M2ToEXF6Olw/default.jpg)
![GaLore EXPLAINED: Memory-Efficient LLM Training by Gradient Low-Rank Projection](https://i.ytimg.com/vi/VC9NbOir7q0/default.jpg)
![Transformer in Transformer: Paper explained and visualized | TNT](https://i.ytimg.com/vi/HWna2c5VXDg/default.jpg)
![Training learned optimizers: VeLO paper EXPLAINED](https://i.ytimg.com/vi/9a6PQJxzUpM/default.jpg)
![Pre-training of BERT-based Transformer architectures explained – language and vision!](https://i.ytimg.com/vi/dabFOBE4eZI/default.jpg)
![What is tokenization and how does it work? Tokenizers explained.](https://i.ytimg.com/vi/D8j1c4NJRfo/default.jpg)
![[Quiz] Regularization in Deep Learning, Lipschitz continuity, Gradient regularization](https://i.ytimg.com/vi/zAyDhZFup9k/default.jpg)
![Adversarial Machine Learning explained! | With examples.](https://i.ytimg.com/vi/YyTyWGUUhmo/default.jpg)
![Do Transformers process sequences of FIXED or of VARIABLE length? | #AICoffeeBreakQuiz](https://i.ytimg.com/vi/Xxts1ithupI/default.jpg)
![FNet: Mixing Tokens with Fourier Transforms – Paper Explained](https://i.ytimg.com/vi/j7pWPdGEfMA/default.jpg)