Загрузка страницы

All You Need to Know on Multilingual Sentence Vectors (1 Model, 50+ Languages)

We've learned about how sentence transformers can be used to create high-quality vector representations of text. We can then use these vectors to find similar vectors, which can be used for many applications such as semantic search or topic modeling.

These models are very good at producing meaningful, information-dense vectors. But they don't allow us to compare sentences across different languages.

Often this may not be a problem. However, the world is becoming increasingly interconnected, and many companies span across multiple borders and languages. Naturally, there is a need for sentence vectors that are language agnostic.

Unfortunately, very few textual similarity datasets span multiple languages, particularly for less common languages. And the standard training methods used for sentence transformers would require these types of datasets.

Different approaches need to be used. Fortunately, some techniques allow us to extend models to other languages using more easily obtained language translations.

In this video, we will cover how multilingual models work and are built. We'll learn how to develop our own multilingual sentence transformers, the datasets to look for, and how to use high-performing pretrained multilingual models.

🌲 Pinecone article:
https://www.pinecone.io/learn/multilingual-transformers/

🤖 70% Discount on the NLP With Transformers in Python course:
https://bit.ly/3DFvvY5

🎉 Subscribe for Article and Video Updates!
https://jamescalam.medium.com/subscribe
https://medium.com/@jamescalam/membership

👾 Discord:
https://discord.gg/c5QtDB9RAP

00:00 Intro
01:19 Multilingual Vectors
05:55 Multi-task Training (mUSE)
09:36 Multilingual Knowledge Distillation
11:13 Knowledge Distillation Training
13:43 Visual Walkthrough
14:53 Parallel Data Prep
20:23 Choosing a Student Model
24:55 Initializing the Models
30:05 ParallelSentencesDataset
33:54 Loss and Fine-tuning
36:59 Model Evaluation
39:23 Outro

Видео All You Need to Know on Multilingual Sentence Vectors (1 Model, 50+ Languages) канала James Briggs
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
4 ноября 2021 г. 18:00:10
00:39:52
Яндекс.Метрика