[slides] JSALT 2025 - Plenary Talk - B. Schuppler: Cross-layer Models for Low-Res Conversational ASR

📍 Live from FIT, Brno University of Technology (Czech Republic), room E112
🕘 July 1st, 2025 — 11:00 CEST
🎙️ Barbara Schuppler (TU Graz, Austria)

In recent years, conversational speech has become a major focus in speech science and technology. As dialogue systems evolve from transactional tools into socially interactive agents, they demand increasingly accurate automatic speech recognition (ASR). At the same time, conversational data offers unique insights into human speech processing. Drawing on the cross-layer optimization principle from communications engineering, I adopt a similar view of how meaning is accessed across multiple levels of speech information. In this talk, I present findings from my group’s work on integrating pronunciation and prosodic variation into ASR for conversational speech. Our hybrid approach—combining data-driven and knowledge-based methods—proves especially effective in low-resource settings. While transformer-based models often outperform classical systems, the latter still excel with short, fragmented utterances when paired with linguistic knowledge. Beyond ASR, our methods inform fields like pathological speech analysis, dementia prediction, and assistive speech technologies.

Bio: Barbara Schuppler studied Physics and Spanish Philology at the University of Graz and the Universidad Autónoma de Madrid, completing a diploma thesis in experimental physics in 2007. She conducted her dissertation within the Marie-Curie RTN "Sound-to-Sense" at Radboud University Nijmegen, with research visits at NTNU Trondheim. After working as teacher at the Graz International Bilingual School, she was awarded an FWF Hertha-Firnberg Grant in 2012 and joined the Signal Processing and Speech Communication Laboratory at TU Graz. Now Associate Professor at TU Graz, her research interests include the investigation of methods for quantitative analyses of prosody and pronunciation variation in conversational speech, the integration of gained phonetic and linguistic knowledge into speech technology, with a specific focus on applications in the educational and healthcare sector.

Видео [slides] JSALT 2025 - Plenary Talk - B. Schuppler: Cross-layer Models for Low-Res Conversational ASR канала Center for Language & Speech Processing(CLSP), JHU

clsp language speech processing

Комментарии отсутствуют

Информация о видео

2 июля 2025 г. 3:33:23

01:16:55

Center for Language & Speech Processing(CLSP), JHU

Теги

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Другие видео канала

[slides] JSALT 2025 - Plenary Talk - B. Schuppler: Cross-layer Models for Low-Res Conversational ASR

CDG-Based Language Models (Mary Harper) - 2009

Words Matter:How Language Choices Predict Societal Trends and Outcomes in Media, Health and Policing

Auditory Attention: From Saliency to Models to Applications -- Malcolm Slaney (Google) - 2017

[camera] Day 5 afternoon - JSALT 2025 - Dušek: Dialogue Systems (NLP with Transformer-based Models)

JSALT 2024 Summer School Information Retrival 1

[slides] EMMA: End to End multi channel multi talker ASR | JSALT 2025 Closing Day 1

Multi-Factor Context-Aware Language Modeling -- Mari Ostendorf (University of Washington) - 2018

HLTCOE Submission to the VoicePrivacy Attacker Challenge at ICASSP 2025

JSALT 2024 Summer School -Information Retrival

JSALT 2024 Summer School NLP for Social Media

Sequence Kernels for Speaker and Speech Recognition – Mark Gales (University of Cambridge) - 2009

The Ins and Outs of Preposition Semantics -- Nathan Schneider (Georgetown University) - 2017

Zipf's Law Suggests a Three-Pronged Approach to Inclusive Speech Recognition–Mark Hasegawa-Johnson

Large Scale Universal Speech Generative Models - Wei-Ning Hsu

Connecting Vision and Language End-to-End -- Kate Saenko (Boston University) - 2018

Unsupervised Learning of Natural Language Structure – Dan Klein (Berkeley) - 2004

Multilingual Representation Distillation with Contrastive Learning - EACL 2023

Can Graph Neural Networks Help Logic Inference? -- Le Song (Georgia Institute of Technology) - 2019

EMMA: End to End multi channel multi talker ASR | JSALT 2025 Closing Day 1 [camera]

[camera] Day 4 afternoon - JSALT 2025 - Šivic: Learning for physical interaction