Загрузка страницы

EI Seminar - Grey Yang - Tuning GPT-3 on a Single GPU via Zero-Shot Hyperparameter Transfer

ABSTRACT: You can’t train GPT-3 on a single GPU, much less tune its hyperparameters (HPs)…or so it seems. I’m here to tell you this is not true: you *can* tune its HPs on a single GPU even if you can’t train it that way! In the first half of this talk, I’ll describe how, in the so-called maximal update parametrization (abbreviated µP), narrow and wide neural networks share the same set of optimal HPs. This lets us tune any large model by just tuning a small version of it — we call this µTransfer. In particular, this allowed us to tune the 6.7 billion parameter version of GPT-3 using only 7% of its pretraining compute budget, and, with some asterisks, we get a performance comparable to the original GPT-3 model with twice the parameter count. In the second half of this talk, I’ll discuss the theoretical reason µP has this special property and the connection to the study of infinite-width neural networks and, more generally, the theory of Tensor Programs. The first half will target general practitioners or empirical researchers in machine learning, while the second half targets those who are more theoretically curious. This talk is based on http://arxiv.org/abs/2203.03466.

BIO: Greg Yang is a researcher at Microsoft Research in Redmond, Washington. He joined MSR after he obtained Bachelor’s in Mathematics and Master’s degrees in Computer Science from Harvard University, respectively advised by ST Yau and Alexander Rush. He won the Hoopes prize at Harvard for best undergraduate thesis as well as Honorable Mention for the AMS-MAA-SIAM Morgan Prize, the highest honor in the world for an undergraduate in mathematics. He gave an invited talk at the International Congress of Chinese Mathematicians 2019.

Видео EI Seminar - Grey Yang - Tuning GPT-3 on a Single GPU via Zero-Shot Hyperparameter Transfer канала MIT Embodied Intelligence
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
22 апреля 2022 г. 14:12:13
00:53:52
Другие видео канала
MIT EI Seminar - Laura Schulz - Curiouser and curiouser: why we make problems for ourselvesMIT EI Seminar - Laura Schulz - Curiouser and curiouser: why we make problems for ourselvesEI Seminar - Graham Neubig - Learning to Explain and Explaining to LearnEI Seminar - Graham Neubig - Learning to Explain and Explaining to LearnEI Seminar - Martin Riedmiller - Learning Controllers - From Engineering to AGIEI Seminar - Martin Riedmiller - Learning Controllers - From Engineering to AGIEI Seminar Livestream - Max TegmarkEI Seminar Livestream - Max TegmarkEI Seminar  - Recent papers in Embodied IntelligenceEI Seminar - Recent papers in Embodied IntelligenceEI Seminar - Beomjoon Kim - Making Robots See and ManipulateEI Seminar - Beomjoon Kim - Making Robots See and ManipulateEI Seminar - Marco Pavone - Building Trust in AI for Autonomous VehiclesEI Seminar - Marco Pavone - Building Trust in AI for Autonomous VehiclesEI Seminar - Jacob Andreas - Good Old-fashioned LLMs (or, Autoformalizing the World)EI Seminar - Jacob Andreas - Good Old-fashioned LLMs (or, Autoformalizing the World)EI Seminar - Maurice Fallon - Multi-Sensor Robot Navigation and Subterranean ExplorationEI Seminar - Maurice Fallon - Multi-Sensor Robot Navigation and Subterranean ExplorationEI Seminar - Chad Jenkins - Semantic Robot Programming... and Maybe Making the Worlda Better PlaceEI Seminar - Chad Jenkins - Semantic Robot Programming... and Maybe Making the Worlda Better PlaceEI Seminar - Joydeep BiswasEI Seminar - Joydeep BiswasMIT EI Seminar - Lerrel Pinto - Diverse data and efficient algorithms for robot learningMIT EI Seminar - Lerrel Pinto - Diverse data and efficient algorithms for robot learningEI Seminar - Yuan Gong - Audio Large Language Models: From Sound Perception to UnderstandingEI Seminar - Yuan Gong - Audio Large Language Models: From Sound Perception to UnderstandingLawson Wong - High-Level Guidance for Generalizable Reinforcement LearningLawson Wong - High-Level Guidance for Generalizable Reinforcement LearningEI Seminar - Monroe Kennedy - Collaborative Robotics: From Dexterity to Teammate PredictionEI Seminar - Monroe Kennedy - Collaborative Robotics: From Dexterity to Teammate PredictionEI Seminar - Rob Fergus - Data Augmentation for Image-Based Reinforcement LearningEI Seminar - Rob Fergus - Data Augmentation for Image-Based Reinforcement LearningEI Seminar - Jacob Steinhardt - Large Language Models as StatisticiansEI Seminar - Jacob Steinhardt - Large Language Models as StatisticiansEI Seminar - Oriol Vinyals - The Deep Learning Toolbox: from AlphaFold to AlphaCodeEI Seminar - Oriol Vinyals - The Deep Learning Toolbox: from AlphaFold to AlphaCodeDaniel Wolpert - Computational principles underlying the learning of sensorimotor repertoiresDaniel Wolpert - Computational principles underlying the learning of sensorimotor repertoiresEI Seminar - Jeannette Bohg - Scaling Robot Learning for Long-Horizon Manipulation TasksEI Seminar - Jeannette Bohg - Scaling Robot Learning for Long-Horizon Manipulation Tasks
Яндекс.Метрика