A Flexible Framework for Entity Resolution | Hoyoung Jang & Cheng Lin
A talk from the Toronto Machine Learning Summit: https://torontomachinelearning.com/
The video is hosted by https://towardsdatascience.com/
About the speakers:
Hoyoung Jang, Lead Data Scientist at ThinkData Works
Cheng Lin, Honours Student at McGill University
About the talk:
A critical component of data management and enrichment pipelines is connecting large datasets from various sources to form a holistic view; to make connections between entities across data sources. Oftentimes, these entities – such as individuals, organizations, or addresses – may not have a unique identifier that can be used as a key to detect duplicates or to merge datasets on. ThinkData has developed a scalable entity resolution engine to solve these problems. After experimenting with both deep learning and traditional NLP techniques, the team has found the best balance of accuracy and performance. Specifically, we have achieved near-parity in accuracy compared to Magellan (the leading entity resolution project in research), albeit with much better performance metrics and greater scalability. This talk will discuss the importance of entity resolution, our approach to solving real-world challenges, and the potential in using entity resolution and graph relationships in tandem.
Видео A Flexible Framework for Entity Resolution | Hoyoung Jang & Cheng Lin канала Towards Data Science
The video is hosted by https://towardsdatascience.com/
About the speakers:
Hoyoung Jang, Lead Data Scientist at ThinkData Works
Cheng Lin, Honours Student at McGill University
About the talk:
A critical component of data management and enrichment pipelines is connecting large datasets from various sources to form a holistic view; to make connections between entities across data sources. Oftentimes, these entities – such as individuals, organizations, or addresses – may not have a unique identifier that can be used as a key to detect duplicates or to merge datasets on. ThinkData has developed a scalable entity resolution engine to solve these problems. After experimenting with both deep learning and traditional NLP techniques, the team has found the best balance of accuracy and performance. Specifically, we have achieved near-parity in accuracy compared to Magellan (the leading entity resolution project in research), albeit with much better performance metrics and greater scalability. This talk will discuss the importance of entity resolution, our approach to solving real-world challenges, and the potential in using entity resolution and graph relationships in tandem.
Видео A Flexible Framework for Entity Resolution | Hoyoung Jang & Cheng Lin канала Towards Data Science
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
![Maps and Meaning Graph based Entity Resolution in Apache Spark & GraphX - Hendrik Frentrup](https://i.ytimg.com/vi/xj74heceJKM/default.jpg)
![Entity Resolution in Slow Motion](https://i.ytimg.com/vi/MPHd1eqU_yo/default.jpg)
![1 + 1 = 1 or Record Deduplication with Python](https://i.ytimg.com/vi/4O87RdBgRJ4/default.jpg)
![dbt, Airflow, and Tableau – Jon Mauney, Betterment](https://i.ytimg.com/vi/5eczxwrYyTQ/default.jpg)
![](https://i.ytimg.com/vi/5jJx3zaFiDs/default.jpg)
![Lorraine D'Almeida - Entity matching at scale | PyData Global 2020](https://i.ytimg.com/vi/nlKE4gvJjMo/default.jpg)
![Identity Resolution Explained in Less Than 10 Minutes](https://i.ytimg.com/vi/g8w1doRmtj0/default.jpg)
![Probabilistic Record Linkage of Hospital Patients - Chris Oakman](https://i.ytimg.com/vi/rGKEOMUtJfE/default.jpg)
![Failure is knowledge, knowledge is success | Tim Gibson | TEDxGriffithUniversity](https://i.ytimg.com/vi/pwnWFNoe7Pw/default.jpg)
![Building out an entity resolution pipeline with Python and dbt, Vouch.us](https://i.ytimg.com/vi/cL2dBMuY2lw/default.jpg)
![User & Device Identity for Microservices @ Netflix Scale](https://i.ytimg.com/vi/eEZHZ806d6o/default.jpg)
![DDD is not enough: the future of software development - Dave West - DDD Europe 2020](https://i.ytimg.com/vi/L_IRchWpnRA/default.jpg)
![Real-Time AI for Entity Resolution](https://i.ytimg.com/vi/FN-Vg57Y7JQ/default.jpg)
![Gaussian Mixture Models for Clustering](https://i.ytimg.com/vi/DODphRRL79c/default.jpg)
![EM algorithm: how it works](https://i.ytimg.com/vi/REypj2sy_5U/default.jpg)
![Introduction to NetworkX in Python](https://i.ytimg.com/vi/flwcAf1_1RU/default.jpg)
![Kid Expert Xander’s Science Knowledge Is Out of This World!](https://i.ytimg.com/vi/puiTMJNACVg/default.jpg)
![Applied Machine Learning for Ranking Products in an Ecommerce Setting Arnoud de Munnik Wehkamp Jerry](https://i.ytimg.com/vi/6BGCn3h59nA/default.jpg)
![Working with RDF in Python](https://i.ytimg.com/vi/sCU214rbRZ0/default.jpg)
![1 + 1 = 1 or Record Deduplication with Python | Flávio Juvenal @ PyBay2018](https://i.ytimg.com/vi/McsTWXeURhA/default.jpg)