- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Zirui Colin Wang - VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
Frontier models such as Gemini-3-Pro and GPT-5 achieve or exceed human performance on elite competitive benchmarks in mathematics, programming, and scientific reasoning. Yet the same models fail more than 90% of the time at solving a simple 3×3 visual jigsaw puzzle through interaction. This gap exposes a fundamental weakness in visual interaction and exploration, a capability essential for autonomous agents and robotics.
In this talk, Colin introduce VisGym, a suite of 17 diverse, customizable, and scalable interactive environments for evaluating and training visual interaction and exploration. Our results show that competitive-level reasoning alone is insufficient for robust visual agents, and that progress in multimodal intelligence requires rethinking how models explore, act, and learn from visual feedback.
Zirui (Colin) Wang is a first-year Ph.D. student in Computer Science at UC Berkeley, advised by Prof. Joseph E. Gonzalez, Prof. Trevor Darrell, and Prof. Ion Stoica. His research focuses on multimodal interactive intelligence, including visual agents, robotics, and volumetric and temporal generation and understanding. He is a Siebel Scholar in Computer Science, and his prior work has been adopted by several frontier vision-language models, including OpenAI’s GPT series, Google Gemini, and Qwen-VL.
This session is brought to you by the Cohere Labs Open Science Community - a space where ML researchers, engineers, linguists, social scientists, and lifelong learners connect and collaborate with each other. We'd like to extend a special thank you to Surya Guthikonda and Manasvi Dawane Leads of our Multimodal group for their dedication in organizing this event.
If you’re interested in sharing your work, we welcome you to join us! Simply fill out the form at https://forms.gle/ALND9i6KouEEpCnz6 to express your interest in becoming a speaker.
Join the Cohere Labs Open Science Community to see a full list of upcoming events (https://tinyurl.com/CohereLabsCommunityApp).
Видео Zirui Colin Wang - VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents канала Cohere
In this talk, Colin introduce VisGym, a suite of 17 diverse, customizable, and scalable interactive environments for evaluating and training visual interaction and exploration. Our results show that competitive-level reasoning alone is insufficient for robust visual agents, and that progress in multimodal intelligence requires rethinking how models explore, act, and learn from visual feedback.
Zirui (Colin) Wang is a first-year Ph.D. student in Computer Science at UC Berkeley, advised by Prof. Joseph E. Gonzalez, Prof. Trevor Darrell, and Prof. Ion Stoica. His research focuses on multimodal interactive intelligence, including visual agents, robotics, and volumetric and temporal generation and understanding. He is a Siebel Scholar in Computer Science, and his prior work has been adopted by several frontier vision-language models, including OpenAI’s GPT series, Google Gemini, and Qwen-VL.
This session is brought to you by the Cohere Labs Open Science Community - a space where ML researchers, engineers, linguists, social scientists, and lifelong learners connect and collaborate with each other. We'd like to extend a special thank you to Surya Guthikonda and Manasvi Dawane Leads of our Multimodal group for their dedication in organizing this event.
If you’re interested in sharing your work, we welcome you to join us! Simply fill out the form at https://forms.gle/ALND9i6KouEEpCnz6 to express your interest in becoming a speaker.
Join the Cohere Labs Open Science Community to see a full list of upcoming events (https://tinyurl.com/CohereLabsCommunityApp).
Видео Zirui Colin Wang - VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents канала Cohere
Комментарии отсутствуют
Информация о видео
19 февраля 2026 г. 3:00:42
00:49:03
Другие видео канала





















