Vision-Language Programs - Antonia Wüst

Antonia Wüst, PhD student at TU Darmstadt, discusses her paper "Synthesizing Visual Concepts as Vision-Language Programs," which introduces a neurosymbolic approach to visual concept induction by combining vision-language models with program synthesis.

The work grew out of Wüst’s early PhD research on visual concept learning with symbolic programs, initially in synthetic domains, and her dissatisfaction with reliance on pre-trained object detectors. As vision-language models matured, the project evolved into a broader attempt to treat these models as perceptual tools embedded within a symbolic reasoning system.

In This Episode -
• Strengths & weaknesses of vision-language models (VLMs)
• Visual concept induction
• Symbol grounding across image sets
• Designing a domain-specific language (DSL) for visual reasoning
• A probabilistic context-free grammar for program search
• Interpretability benefits of synthesized visual programs
• Bongard problems and human-like abstraction

References -

• https://arxiv.org/abs/2511.18964
• https://cs.stanford.edu/people/jcjohns/clevr/
• https://en.wikipedia.org/wiki/Bongard_problem
• https://wolfstam.github.io/
• https://www.hikarushindo.com/
• https://www.ml.informatik.tu-darmstadt.de/people/lhelff/index.html
• https://ojs.aaai.org/index.php/AAAI/article/view/20616
• https://arcprize.org/arc-agi

About the Paper -

“Synthesizing Visual Concepts as Vision-Language Programs”
Antonia Wüst, Wolfgang Stammer, Hikaru Shindo, Lucas Nunes, Christian Kersting
NeurIPS 2025

The paper presents a neurosymbolic framework that combines vision-language models with program synthesis to learn visual concepts from examples. Vision-language models provide grounded symbolic representations, while program synthesis performs explicit reasoning to derive interpretable and reliable visual rules.

https://arxiv.org/abs/2511.18964

About the Guest -
Antonia Wüst is a PhD student at Technische Universität Darmstadt in the AI and Machine Learning Lab, supervised by Christian Kersting. Her research focuses on abstract visual reasoning, visual concept induction, and neurosymbolic AI, with an emphasis on combining perception and symbolic reasoning.
• https://www.ml.informatik.tu-darmstadt.de/people/awuest/index.html
• https://x.com/toniwuest

Credits -

• Host & Music: Bryan Landers, Technical Staff, Ndea
• Editor: Alejandro Ramirez
• https://x.com/ndea
• https://x.com/bryanlanders
• https://ndea.com

Видео Vision-Language Programs - Antonia Wüst канала Ndea

Комментарии отсутствуют