- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Accelerate Transformer inference with AWS Inferentia
In this video, I show you how to accelerate Transformer inference with AWS Inferentia, a custom chip designed by AWS.
Starting from a BERT model that I fine-tuned on AWS Trainium (https://youtu.be/HweP7OYNiIA) , I compile it with the Neuron SDK for Inferentia. Then, using an inf1.6xlarge instance (4 Inferentia chips, 16 Neuron Cores), I show you how to use pipeline mode to predict at scale, reaching over 4,000 predictions per second at 3-millisecond latency.
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos ⭐️⭐️⭐️
⭐️⭐️⭐️ Want to buy me a coffee? I can always use more :) https://www.buymeacoffee.com/julsimon ⭐️⭐️⭐️
- Amazon EC2 Inf1: https://aws.amazon.com/ec2/instance-types/inf1/
- AWS Neuron SDK documentation: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/index.html
- AWS blog post: https://aws.amazon.com/fr/blogs/machine-learning/achieve-12x-higher-throughput-and-lowest-latency-for-pytorch-natural-language-processing-applications-out-of-the-box-on-aws-inferentia/
- Setup steps and code: https://gitlab.com/juliensimon/huggingface-demos/-/tree/main/inferentia
Interested in hardware acceleration for Transformers? Check out my other videos :
- Training on Habana Gaudi: https://youtu.be/56fpEa1Y1F8
- Training on Graphcore: https://youtu.be/DgcJscPu1Vo
- Predicting with ONNX: https://youtu.be/_AKFDOnrZz8
- Predicting with Intel OpenVINO: https://youtu.be/mfj1QrZWkk8
- Inferentia compilation on SageMaker: https://youtu.be/pokM1r3rgIg
Видео Accelerate Transformer inference with AWS Inferentia канала Julien Simon
Starting from a BERT model that I fine-tuned on AWS Trainium (https://youtu.be/HweP7OYNiIA) , I compile it with the Neuron SDK for Inferentia. Then, using an inf1.6xlarge instance (4 Inferentia chips, 16 Neuron Cores), I show you how to use pipeline mode to predict at scale, reaching over 4,000 predictions per second at 3-millisecond latency.
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos ⭐️⭐️⭐️
⭐️⭐️⭐️ Want to buy me a coffee? I can always use more :) https://www.buymeacoffee.com/julsimon ⭐️⭐️⭐️
- Amazon EC2 Inf1: https://aws.amazon.com/ec2/instance-types/inf1/
- AWS Neuron SDK documentation: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/index.html
- AWS blog post: https://aws.amazon.com/fr/blogs/machine-learning/achieve-12x-higher-throughput-and-lowest-latency-for-pytorch-natural-language-processing-applications-out-of-the-box-on-aws-inferentia/
- Setup steps and code: https://gitlab.com/juliensimon/huggingface-demos/-/tree/main/inferentia
Interested in hardware acceleration for Transformers? Check out my other videos :
- Training on Habana Gaudi: https://youtu.be/56fpEa1Y1F8
- Training on Graphcore: https://youtu.be/DgcJscPu1Vo
- Predicting with ONNX: https://youtu.be/_AKFDOnrZz8
- Predicting with Intel OpenVINO: https://youtu.be/mfj1QrZWkk8
- Inferentia compilation on SageMaker: https://youtu.be/pokM1r3rgIg
Видео Accelerate Transformer inference with AWS Inferentia канала Julien Simon
Комментарии отсутствуют
Информация о видео
17 ноября 2022 г. 16:47:26
00:20:25
Другие видео канала





















