Загрузка...

A Quantum Approach to Vision Language Modelling

Speaker: Mehrnoosh Sadrzadeh
Moderator: Ted Theodosopoulos

Abstract: Vision-language models excel at large-scale image-text alignment but often neglect the compositional structure of language, leading to failures on tasks that hinge on word order and predicate-argument structure. We show how techniques from tensor networks and variational quantum circuits help us solve the problem. To this end, we introduce two tools DisCoCLIP and QuCLIP, multimodal encoders that combine a frozen CLIP vision transformer with a tensor network text encoder that explicitly encodes syntactic structure. We also work with translations of syntax into variational quantum circuits. We train both models with a self-supervised contrastive loss and show how the models improve on compositional benchmarks such as SVO-Probes and ARO, while using a significantly smaller number of parameters. The parameter reduction is a known feature of tensor networks and variational quantum circuits, and for this case, was on average from hundreds of millions to tens of thousand.

Speaker's bio:
Mehrnoosh is a Professor of CS, leads UCL CS's Quantum Learning Labs, and is the CS Director of Research. Her research is supported by a Royal Academy of Engineering (RAEng) Research Chair, jointly with the BBC and Quantinuum Ltd. Mehrnoosh’s UG and MSc studies were in Sharif University in Iran. Her PhD in University of Quebec at Montreal. Previously, Mehrnoosh had two RAEng Industrial Fellowships, in QMUL and UCL, an EPSRC Career Acceleration Fellowship in Oxford, an EPSRS PDRF and a Wolfson College Junior Research Fellowship, also at Oxford.

Moderator's bio:
Ted is a mathematician who, after working for years in academia and industry, transitioned to teaching at the pre-college level seventeen years ago, the last nine at Nueva, where he teaches math and economics. Ted’s research background is in the area of interacting stochastic systems, with particular applications in biology and economics.

Видео A Quantum Approach to Vision Language Modelling канала Relatorium
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять