Lev Konstantinovskiy - Next generation of word embeddings in Gensim
Filmed at PyData London 2017
www.pydata.org
Description
There are many ways to find similar words/docs with an open-source Natural Language processing library Gensim that I maintain. I will give an overview of modern word embeddings like Google's Word2vec, Facebook's FastText, GloVe, WordRank, VarEmbed and discuss what business tasks fit them best.
Abstract
What is the most similar word to "king"? It depends on what you mean by similar. "King" can be interchanged with "Canute", but it's attribute is "crown". We will discuss how to achieve these two kinds of similarity from word embeddings. Also touch on how to deal with the common issues of rare, frequent and out of vocabulary words.
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
We aim to be an accessible, community-driven conference, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.
Видео Lev Konstantinovskiy - Next generation of word embeddings in Gensim канала PyData
www.pydata.org
Description
There are many ways to find similar words/docs with an open-source Natural Language processing library Gensim that I maintain. I will give an overview of modern word embeddings like Google's Word2vec, Facebook's FastText, GloVe, WordRank, VarEmbed and discuss what business tasks fit them best.
Abstract
What is the most similar word to "king"? It depends on what you mean by similar. "King" can be interchanged with "Canute", but it's attribute is "crown". We will discuss how to achieve these two kinds of similarity from word embeddings. Also touch on how to deal with the common issues of rare, frequent and out of vocabulary words.
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
We aim to be an accessible, community-driven conference, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.
Видео Lev Konstantinovskiy - Next generation of word embeddings in Gensim канала PyData
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
Robert Meyer - Analysing user comments with Doc2Vec and Machine Learning classificationJames Powell: So you want to be a Python expert? | PyData Seattle 2017Word EmbeddingsNatural Language Processing in PythonLev Konstantinovskiy - Text similiarity with the next generation of word embeddings in GensimProf. Brian Cox - Machine Learning & Artificial Intelligence - Royal SocietyBhargav Srinivasa Desikan - Topic Modelling with GensimEmbeddings for Everything: Search in the Neural Network EraNatural Language Generation at Google ResearchUnderstanding Word2VecPython Word Embedding using Word2vec and keras|How to use word embedding in pythonAndrew Rowan - Bayesian Deep Learning with Edward (and a trick using Dropout)Strata 2014: Geoffrey Moore, "Crossing the Chasm: What's New, What's Not"The Future of Software Engineering • Mary Poppendieck • GOTO 2016What are Embedding Layers in Keras (11.3)Uwe L Korn - Efficient and portable DataFrame storage with Apache ParquetBERT v/s Word2Vec Simplest ExampleRoberto Navigli: Multilingual sense embeddings, Word Sense Disambiguation and Semantic Role LabelingRasa Algorithm Whiteboard - StarSpacePystan: Bayesian Inference for Fun and Profit