Pattern Exploiting Training explained! | PET, iPET, ADAPET
Small language models are also few-shot learners! Here you can find all about PET, iPET, ADAPET. Choose your favorite!
Few-shot learning for "normal-sized" language models like BERT or ALBERT with pattern-exploiting training (PET). Not only GPT-3 is a few-shot learner, at least not on SuperGLUE.
📺 Ms. Coffee Bean explains the Transformer: https://youtu.be/FWFA4DGuzSc
📺 Ms. Coffee Bean on GPT-3: https://youtu.be/5fqxPOaaqi0
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to boost our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Outline:
* 00:00 Small language models are also few-shot learners
* 01:30 Few-shot learning for GPT-3
* 02:58 Few-shot learning for everyone: PET
* 07:29 iPET
* 08:00 The gist of PET
* 08:20 ADAPET
* 11:53 Wrap-up
📄 Schick, T., & Schütze, H. (2020). Exploiting cloze questions for few-shot text classification and natural language inference. arXiv preprint arXiv:2001.07676. https://arxiv.org/abs/2001.07676
📄 Schick, T., & Schütze, H. (2020). It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners. arXiv preprint arXiv:2009.07118. https://arxiv.org/abs/2009.07118
📄 Tam, D., Menon, R. R., Bansal, M., Srivastava, S., & Raffel, C. (2021). Improving and Simplifying Pattern Exploiting Training. arXiv preprint arXiv:2103.11955. https://arxiv.org/abs/2103.11955
📄 Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165. https://arxiv.org/abs/2005.14165
Music 🎵 : The Truth by Anno Domini Beats
---------------------------------
🔗 Links:
YouTube: https://www.youtube.com/AICoffeeBreak
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
#AICoffeeBreak #MsCoffeeBean #few-shot-learning #gpt3 #MachineLearning #AI #research
Видео Pattern Exploiting Training explained! | PET, iPET, ADAPET канала AI Coffee Break with Letitia
Few-shot learning for "normal-sized" language models like BERT or ALBERT with pattern-exploiting training (PET). Not only GPT-3 is a few-shot learner, at least not on SuperGLUE.
📺 Ms. Coffee Bean explains the Transformer: https://youtu.be/FWFA4DGuzSc
📺 Ms. Coffee Bean on GPT-3: https://youtu.be/5fqxPOaaqi0
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to boost our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Outline:
* 00:00 Small language models are also few-shot learners
* 01:30 Few-shot learning for GPT-3
* 02:58 Few-shot learning for everyone: PET
* 07:29 iPET
* 08:00 The gist of PET
* 08:20 ADAPET
* 11:53 Wrap-up
📄 Schick, T., & Schütze, H. (2020). Exploiting cloze questions for few-shot text classification and natural language inference. arXiv preprint arXiv:2001.07676. https://arxiv.org/abs/2001.07676
📄 Schick, T., & Schütze, H. (2020). It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners. arXiv preprint arXiv:2009.07118. https://arxiv.org/abs/2009.07118
📄 Tam, D., Menon, R. R., Bansal, M., Srivastava, S., & Raffel, C. (2021). Improving and Simplifying Pattern Exploiting Training. arXiv preprint arXiv:2103.11955. https://arxiv.org/abs/2103.11955
📄 Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165. https://arxiv.org/abs/2005.14165
Music 🎵 : The Truth by Anno Domini Beats
---------------------------------
🔗 Links:
YouTube: https://www.youtube.com/AICoffeeBreak
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
#AICoffeeBreak #MsCoffeeBean #few-shot-learning #gpt3 #MachineLearning #AI #research
Видео Pattern Exploiting Training explained! | PET, iPET, ADAPET канала AI Coffee Break with Letitia
Показать
Комментарии отсутствуют
Информация о видео
2 апреля 2021 г. 16:43:17
00:14:09
Другие видео канала
Do Transformers process sequences of FIXED or of VARIABLE length? | #AICoffeeBreakQuizData leakage during data preparation? | Using AntiPatterns to avoid MLOps MistakesAn image is worth 16x16 words: ViT | Is this the extinction of CNNs? Long live the Transformer?Foundation Models | On the opportunities and risks of calling pre-trained models “Foundation Models”Few-Shot Learning (1/3): Basic ConceptsMasked Autoencoders Are Scalable Vision Learners – Paper explained and animated!Generalization – Interpolation – Extrapolation in Machine Learning: Which is it now!?What Ancient Egyptian Sounded Like - and how we knowWhat is tokenization? How does it work? Tokenizers explained.Data BAD | What Will it Take to Fix Benchmarking for NLU?How to acquire any language NOT learn it!Eyes tell all: How to tell that an AI generated a faceExtracting Training Data from Large Language Models (Paper Explained)Linear algebra with Transformers – Paper ExplainedJohn McWhorter: Txtng is killing language. JK!!!GPT3: An Even Bigger Language Model - ComputerphileHow to increase the receptive field in CNNs? | #AICoffeeBreakQuiz #ShortsLecture 20: Zero shot and few shot learning in NLPMeta-Learning and One-Shot Learning