IDSS Distinguished Speaker Seminar: Sanjeev Arora | March 5, 2019
Title: A Theory for Representation Learning via Contrastive Objectives
Abstract:
Representation learning seeks to represent complicated data (images, text etc.) using a vector embedding, which can then be used to solve complicated new classification tasks using simple methods like a linear classifier. Learning such embeddings is an important type of unsupervised learning (learning from unlabeled data) today. Several recent methods leverage pairs of “semantically similar” data points (eg sentences occuring next to each other in a text corpus). We call such methods contrastive learning (another term would be “like word2vec”) and propose a theoretical framework for analysing them. The challenge for theory here is that the training objective seems to have little to do with the downstream task. Our framework bridges this challenge and can show provable guarantees on the performance of the learnt representation on downstream classification tasks. I’ll show experiments supporting the theory.
The talk will be self-contained.
(Joint work with Hrishikesh Khandeparkar, Mikhail Khodak, Orestis Plevrakis, and Nikunj Saunshi.)
About the Speaker:
Sanjeev Arora is Charles C. Fitzmorris Professor of Computer Science at Princeton University and Visiting Professor in Mathematics at the Institute for Advanced Study. He works on theoretical computer science and theoretical machine learning. He has received the Packard Fellowship (1997), Simons Investigator Award (2012), Gödel Prize (2001 and 2010), ACM Prize in Computing (formerly the ACM-Infosys Foundation Award in the Computing Sciences) (2012), and the Fulkerson Prize in Discrete Math (2012). He is a fellow of the American Academy of Arts and Sciences and member of the National Academy of Science.
Видео IDSS Distinguished Speaker Seminar: Sanjeev Arora | March 5, 2019 канала MIT Institute for Data, Systems, and Society
Abstract:
Representation learning seeks to represent complicated data (images, text etc.) using a vector embedding, which can then be used to solve complicated new classification tasks using simple methods like a linear classifier. Learning such embeddings is an important type of unsupervised learning (learning from unlabeled data) today. Several recent methods leverage pairs of “semantically similar” data points (eg sentences occuring next to each other in a text corpus). We call such methods contrastive learning (another term would be “like word2vec”) and propose a theoretical framework for analysing them. The challenge for theory here is that the training objective seems to have little to do with the downstream task. Our framework bridges this challenge and can show provable guarantees on the performance of the learnt representation on downstream classification tasks. I’ll show experiments supporting the theory.
The talk will be self-contained.
(Joint work with Hrishikesh Khandeparkar, Mikhail Khodak, Orestis Plevrakis, and Nikunj Saunshi.)
About the Speaker:
Sanjeev Arora is Charles C. Fitzmorris Professor of Computer Science at Princeton University and Visiting Professor in Mathematics at the Institute for Advanced Study. He works on theoretical computer science and theoretical machine learning. He has received the Packard Fellowship (1997), Simons Investigator Award (2012), Gödel Prize (2001 and 2010), ACM Prize in Computing (formerly the ACM-Infosys Foundation Award in the Computing Sciences) (2012), and the Fulkerson Prize in Discrete Math (2012). He is a fellow of the American Academy of Arts and Sciences and member of the National Academy of Science.
Видео IDSS Distinguished Speaker Seminar: Sanjeev Arora | March 5, 2019 канала MIT Institute for Data, Systems, and Society
Показать
Комментарии отсутствуют
Информация о видео
12 марта 2019 г. 18:24:42
01:01:46
Другие видео канала
![MicroMasters learners from UTEC Uruguay visit MIT](https://i.ytimg.com/vi/mVPe2Ug7m1w/default.jpg)
![Day Two: Panel - Statistics & Human Health](https://i.ytimg.com/vi/-FANxR1RaOM/default.jpg)
![Day Two - Opening Remarks](https://i.ytimg.com/vi/CYZqx09rqh4/default.jpg)
![Claire Tomlin (UC Berkeley): "Safe Learning in Robotics"](https://i.ytimg.com/vi/wgrUBmi2f7w/default.jpg)
![Dr. Robert C. Hampshire (US Department of Transportation)](https://i.ytimg.com/vi/z0ts0aYKr3w/default.jpg)
![WiDS Cambridge 2021 Panel - Machine Learning for Health Care](https://i.ytimg.com/vi/3XurSH_63Fw/default.jpg)
![2016 04 05 IDSS Bayen Distinguished Seminar](https://i.ytimg.com/vi/bSdImmb5DLw/default.jpg)
![Clean Electricity and the Path to Net Zero: Methods and Insights](https://i.ytimg.com/vi/ifgL9r9mqgo/default.jpg)
![Day Two - Jon Kleinberg (Cornell University)](https://i.ytimg.com/vi/t2zmrTtplPg/default.jpg)
![IDSS Distinguished Seminar Speaker Phillip Rogaway 10-18-16](https://i.ytimg.com/vi/GNW1Qz8gdUI/default.jpg)
![Spring 2021 - Bhaswar B. Bhattacharya](https://i.ytimg.com/vi/99OPzqNAl-Y/default.jpg)
![The Lasso An Application to Cancer Detection and Some New Tools for Selective Inference](https://i.ytimg.com/vi/h4HjhFxNwvI/default.jpg)
![Session 5: William Hogan](https://i.ytimg.com/vi/eFpbiAi9LBk/default.jpg)
![Welcome](https://i.ytimg.com/vi/xU7uBaF9Uyg/default.jpg)
![Session 9: Vincent Blondel](https://i.ytimg.com/vi/r-We7Dg_tDA/default.jpg)
![IDSS Distinguished Speaker Seminar Series - Guido Imbens (Stanford University)](https://i.ytimg.com/vi/Tvpd_ZChfiY/default.jpg)
![Session 5: Bob Armstrong](https://i.ytimg.com/vi/g99Ikw3-fX0/default.jpg)
![Cynthia Rudin -- Question and Answer session for Secrecy, Criminal Justice, and Variable Importance](https://i.ytimg.com/vi/p4DTrcK6tLs/default.jpg)
![Women in Data Science (WiDS) Cambridge 2022: Keynote](https://i.ytimg.com/vi/V6SGz41cE5Q/default.jpg)
![Beyond Fairness - Opening Remarks](https://i.ytimg.com/vi/uRkbb7ES7a0/default.jpg)
![Day Two: Panel - Global Education Online](https://i.ytimg.com/vi/1GbM5Rk6Xuw/default.jpg)