Ashmal Vayani - Seeing the World as It Speaks Multilingual, Culturally Aware Multimodal AI

For all the promise of “global” multimodal AI, most systems still speak to a narrow slice of the world. This talk traces a path to genuinely multilingual, culturally grounded models through three steps: first, ALM-Bench reframes evaluation by testing 100 languages across culturally situated tasks, revealing where today’s LMMs falter, especially on low-resource scripts. Next, ViMUL moves beyond images to video, pairing a diverse 14-language, 15-domain benchmark with a balanced baseline to show how training and evaluation can align for robust multilingual video understanding. Finally, we examine language itself as a causal factor: a cross-lingual T2I study where grammatical gender shifts visual outputs, surfacing a new axis of bias. Together, these pieces offer a story and a blueprint for inclusive, reliable multimodal systems.

I am an MSc. student in the College of Engineering and Computer Science department at the University of Central Florida. I am a member of the Center for Research in Computer Vision (CRCV) Lab advised by Prof. Mubarak Shah.

Previously, I was a Research Engineer in the Computer Vision Department, affiliated with the IVAL-Lab at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI).

This session is brought to you by the Cohere Labs Open Science Community - a space where ML researchers, engineers, linguists, social scientists, and lifelong learners connect and collaborate with each other. We'd like to extend a special thank you to Ahmad Anis and Kanwal Mehreen, Lead of our Geo Regional Asia group for their dedication in organizing this event.

If you’re interested in sharing your work, we welcome you to join us! Simply fill out the form at https://forms.gle/ALND9i6KouEEpCnz6 to express your interest in becoming a speaker.

Join the Cohere Labs Open Science Community to see a full list of upcoming events (https://tinyurl.com/CohereLabsCommunityApp).

Видео Ashmal Vayani - Seeing the World as It Speaks Multilingual, Culturally Aware Multimodal AI канала Cohere

Комментарии отсутствуют