Все видео Новые видео Популярные видео Категории видео

Авто	Видео-блоги	ДТП, аварии	Для маленьких	Еда, напитки
Животные	Закон и право	Знаменитости	Игры	Искусство
Комедии	Красота, мода	Кулинария, рецепты	Люди	Мото
Музыка	Мультфильмы	Наука, технологии	Новости	Образование
Политика	Праздники	Приколы	Природа	Происшествия
Путешествия	Развлечения	Ржач	Семья	Сериалы
Спорт	Стиль жизни	ТВ передачи	Танцы	Технологии
Товары	Ужасы	Фильмы	Шоу-бизнес	Юмор

All You Need to Know on Multilingual Sentence Vectors (1 Model, 50+ Languages)

We've learned about how sentence transformers can be used to create high-quality vector representations of text. We can then use these vectors to find similar vectors, which can be used for many applications such as semantic search or topic modeling.

These models are very good at producing meaningful, information-dense vectors. But they don't allow us to compare sentences across different languages.

Often this may not be a problem. However, the world is becoming increasingly interconnected, and many companies span across multiple borders and languages. Naturally, there is a need for sentence vectors that are language agnostic.

Unfortunately, very few textual similarity datasets span multiple languages, particularly for less common languages. And the standard training methods used for sentence transformers would require these types of datasets.

Different approaches need to be used. Fortunately, some techniques allow us to extend models to other languages using more easily obtained language translations.

In this video, we will cover how multilingual models work and are built. We'll learn how to develop our own multilingual sentence transformers, the datasets to look for, and how to use high-performing pretrained multilingual models.

🌲 Pinecone article:
https://www.pinecone.io/learn/multilingual-transformers/

🤖 70% Discount on the NLP With Transformers in Python course:
https://bit.ly/3DFvvY5

🎉 Subscribe for Article and Video Updates!
https://jamescalam.medium.com/subscribe
https://medium.com/@jamescalam/membership

👾 Discord:
https://discord.gg/c5QtDB9RAP

00:00 Intro
01:19 Multilingual Vectors
05:55 Multi-task Training (mUSE)
09:36 Multilingual Knowledge Distillation
11:13 Knowledge Distillation Training
13:43 Visual Walkthrough
14:53 Parallel Data Prep
20:23 Choosing a Student Model
24:55 Initializing the Models
30:05 ParallelSentencesDataset
33:54 Loss and Fine-tuning
36:59 Model Evaluation
39:23 Outro

Видео All You Need to Know on Multilingual Sentence Vectors (1 Model, 50+ Languages) канала James Briggs

Показать

Комментарии отсутствуют