Modular Language Models
Conventional language models (LMs) are trained densely: all parameters are updated with respect to all data. We argue that dense training leads to a variety of well-documented issues with LMs, including their prohibitive training cost and unreliable downstream behavior. We then introduce a new class of LMs that are fundamentally modular, where components (or experts) of the LM are specialized to distinct domains in the training corpus, and experts are conditionally updated based on the domain of the incoming document. We show how modularity addresses the limitations of dense training by enabling LMs that are rapidly customizable (with the ability to mix, add, or remove experts after training), embarrassingly parallel (requiring no communication between experts), and sparse (needing only a few experts active at a time for inference). Key to our proposal is exploring what constitutes the domains to which experts specialize, as well as reflecting on the data sources used to train LMs. Our new techniques chart a path towards collaborative and personalized LMs, where anyone can contribute to, maintain, and deploy experts at very modest computational cost.
Suchin Gururangan is a 4th year PhD candidate at the University of Washington, advised by Noah A. Smith and Luke Zettlemoyer. He was previously a visiting researcher at Meta AI, a pre-doctoral resident at the Allen Institute for AI, and spent several years in industry as a data scientist. His research interests span many areas of NLP; currently he works on modular, sparse language models that are efficient to customize and scale. His work has received awards at ACL 2020 and 2021, and he is supported by the Bloomberg Data Science PhD Fellowship.
Видео Modular Language Models канала Allen Institute for AI
Suchin Gururangan is a 4th year PhD candidate at the University of Washington, advised by Noah A. Smith and Luke Zettlemoyer. He was previously a visiting researcher at Meta AI, a pre-doctoral resident at the Allen Institute for AI, and spent several years in industry as a data scientist. His research interests span many areas of NLP; currently he works on modular, sparse language models that are efficient to customize and scale. His work has received awards at ACL 2020 and 2021, and he is supported by the Bloomberg Data Science PhD Fellowship.
Видео Modular Language Models канала Allen Institute for AI
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
![Text Modular Networks: Learning to Decompose Tasks in the Language of Existing Models](https://i.ytimg.com/vi/_Q59o0f_HC8/default.jpg)
![Visual Reaction](https://i.ytimg.com/vi/iyAoPuHxvYs/default.jpg)
![Horacio Saggion: Mining and Enriching Scientific Text Collections](https://i.ytimg.com/vi/DB6DcKYlC4w/default.jpg)
![Ajay Nagesh: Exploring Relational Features and Learning](https://i.ytimg.com/vi/LzcIUIFlvSA/default.jpg)
![Learning for Never-before-seen Biomedicine](https://i.ytimg.com/vi/GjL3jjQFVh0/default.jpg)
![Just-DREAM-about-it: Figurative Language Understanding with DREAM-FLUTE](https://i.ytimg.com/vi/gapJa67kaKc/default.jpg)
![Rishabh Iyer: Submodular Optimization and Data Summarization with Applications to Computer Vision](https://i.ytimg.com/vi/LEpS0iVdD4o/default.jpg)
![Kevin Gimpel: From Paraphrase Modeling to Controlled Generation](https://i.ytimg.com/vi/W-q6iTfWxM4/default.jpg)
![Applied AI in High-Expertise Settings, or Curation as Programming](https://i.ytimg.com/vi/zmeLQiO_P1M/default.jpg)
![From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project | AI2](https://i.ytimg.com/vi/CR3aICkhCJM/default.jpg)
![When Not to Trust Language Models: Investigating Effectiveness of Parametric&Non-Parametric Memories](https://i.ytimg.com/vi/hJbxW0xct2E/default.jpg)
![Visual Room Rearrangement (CVPR 2021)](https://i.ytimg.com/vi/1APxaOC9U-A/default.jpg)
![Adapting to Long Tail Domains: A Case Study in Clinical Information | AI2](https://i.ytimg.com/vi/cc1SIr1Heaw/default.jpg)
![Kenneth D. Forbus: Multimodal Science Learning](https://i.ytimg.com/vi/rzS-1fZ26G8/default.jpg)
![Cross-Task Generalization via Natural Language Crowdsourcing Instructions](https://i.ytimg.com/vi/DbUXFCmeJ34/default.jpg)
![Daniel Khashabi - Natural Language Understanding with Indirect Supervision](https://i.ytimg.com/vi/__h_iApLbys/default.jpg)
![Jesse Dodge: Open Loop Hyperparameter Optimization and Determinantal Point Processes](https://i.ytimg.com/vi/el_DbbqXuQY/default.jpg)
![Dr. Asma Ben Abacha: Medical Question Answering](https://i.ytimg.com/vi/Fjsz5Giw9rs/default.jpg)
![Towards Teachable Reasoning Systems: Using a Dynamic Memory of User Feedback for System Improvement](https://i.ytimg.com/vi/c5j_tWsENFg/default.jpg)
![Robot Learning by Understanding Egocentric Videos](https://i.ytimg.com/vi/4WznUQvDQEw/default.jpg)
![Explaining Answers with Entailment Trees](https://i.ytimg.com/vi/QPSZQYA1RmA/default.jpg)