Загрузка страницы

Sebastian Jaszczur – Fine-Grained Conditional Computation in Transformers | ML in PL 22

Fine-Grained Conditional Computation in Transformers by Sebastian Jaszczur (IDEAS NCBR, University of Warsaw), 5 November 2022

Large Transformer models yield impressive results on many tasks, but are expensive to train, or even fine-tune, and so slow at decoding that the use and study of the largest models becomes out of reach for many researchers and end-users. Conditional computation, or sparsity, may help alleviate those problems.

In my work "Sparse is Enough in Scaling Transformers", done at Google Research and published at NeurIPS 2021, we showed that sparse layers leveraging fine-grained conditional computation can enable Transformers to scale efficiently and perform unbatched decoding much faster than standard Transformer. Importantly, in contrast to standard Mixture-of-Expert methods, this fine-grained sparsity achieves the speed-up without decreasing the model quality, and with the same number of model parameters.

My current work on this topic, done at IDEAS NCBR, focuses on adjusting those conditional computation methods to the training environment, with the goal of speeding up the training process as well as the inference. This can be achieved by a careful redesign of fine-grained conditional computation while using only dense tensor operations, which are efficient on modern accelerators. While this is still an ongoing work, the preliminary results show the promise of improving the training speed of Transformers on existing hardware, without degrading the quality of the model's predictions.

The talk was delivered during ML in PL Conference 2022 as a part of Contributed Talks. The conference was organized by a non-profit NGO called ML in PL Association.

ML in PL Association website: https://mlinpl.org/
ML in PL Conference 2022 website: https://conference2022.mlinpl.org/
ML In PL Conference 2023 website: https://conference2023.mlinpl.org/

---

ML in PL Association was founded based on the experiences in organizing of the ML in PL Conference (formerly PL in ML), the ML in PL Association is a non-profit organization devoted to fostering the machine learning community in Poland and Europe and promoting a deep understanding of ML methods. Even though ML in PL is based in Poland, it seeks to provide opportunities for international cooperation.

Видео Sebastian Jaszczur – Fine-Grained Conditional Computation in Transformers | ML in PL 22 канала ML in PL
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
19 сентября 2023 г. 21:00:08
00:18:23
Другие видео канала
Rafał Pilarczyk: Is Artificial Intelligence a threat to musicians? – Music generation techniquesRafał Pilarczyk: Is Artificial Intelligence a threat to musicians? – Music generation techniquesPhilippe Preux - Bits of Reinforcement Learning | ML in PL 23Philippe Preux - Bits of Reinforcement Learning | ML in PL 23B. Ludwiczuk, K. Jasinska-Kobus (Allegro) - Batch construction strategies in deep metric learningB. Ludwiczuk, K. Jasinska-Kobus (Allegro) - Batch construction strategies in deep metric learningMarcin Andrychowicz - Solving Rubik’s Cube with a Robot HandMarcin Andrychowicz - Solving Rubik’s Cube with a Robot HandAdam Paszke: PyTorch 1.0: now and in the futureAdam Paszke: PyTorch 1.0: now and in the futureAdam Podraza: Applied time series forecasting using machine learningAdam Podraza: Applied time series forecasting using machine learningGül Varol - Learning human body representations from visual dataGül Varol - Learning human body representations from visual dataDavid Haber - Opportunities and Challenges when Building AI for Autonomous FlightDavid Haber - Opportunities and Challenges when Building AI for Autonomous FlightAdam Gonczarek (Alphamoon) – Intelligent Document ProcessingAdam Gonczarek (Alphamoon) – Intelligent Document ProcessingJonasz Pamuła (RTB House) – ML Challenges in cookieless worldJonasz Pamuła (RTB House) – ML Challenges in cookieless worldJoão Henriques - Mapping environments with deep networks and spatial memoriesJoão Henriques - Mapping environments with deep networks and spatial memoriesKrzysztof Geras (NYU): "Towards Solving Breast Cancer Screening Diagnosis with Deep Learning"Krzysztof Geras (NYU): "Towards Solving Breast Cancer Screening Diagnosis with Deep Learning"Stanisław Jastrzębski - Deep Learning in the Light of the Simplicity Bias | MLSS Kraków 2023Stanisław Jastrzębski - Deep Learning in the Light of the Simplicity Bias | MLSS Kraków 2023How to learn classifier chains using positive-unlabelled multi-label data? | ML in PL 22How to learn classifier chains using positive-unlabelled multi-label data? | ML in PL 22Yoshua Bengio – Cognitively-inspired inductive biases for higher-level cognitionYoshua Bengio – Cognitively-inspired inductive biases for higher-level cognitionTomasz Grel (Nvidia): Faster Deep Learning with mixed precision and multiple GPUsTomasz Grel (Nvidia): Faster Deep Learning with mixed precision and multiple GPUsPanel Discussion – Women in MLPanel Discussion – Women in MLMichał Jamroż - Class fitting in residual convolutional networks | ML in PL 23Michał Jamroż - Class fitting in residual convolutional networks | ML in PL 23Sebastian Cygert - Toward continually learning models | ML in PL 23Sebastian Cygert - Toward continually learning models | ML in PL 23Barbara Rychalska - Neural Machine Translation: achievements, challenges and the way forwardBarbara Rychalska - Neural Machine Translation: achievements, challenges and the way forwardStanisław Jastrzębski - Gradient Alignment: When Deep Networks Work, and When They Don'tStanisław Jastrzębski - Gradient Alignment: When Deep Networks Work, and When They Don't
Яндекс.Метрика