World’s first large multimodal model (LMM) with audio reasoning on a Windows PC

Generative AI and large language models (LLMs) have taken the world by storm, but until recently LLMs have been mostly limited to text inputs. In this MWC 2024 technology demo, we showcase the world’s first large multimodal model (LMM) with audio reasoning on a Windows PC. LLMs can now hear, understanding audio and being able to reason about it.

On a Windows PC, Qualcomm AI Research is showcasing an on-device demonstration of a 7+ billion parameter LMM that can accept text and audio inputs (e.g., music, sound of traffic, etc.) and then generate multi-turn conversations about the audio at a responsive token rate. With our full-stack AI optimization, we achieve high performance at low power. By processing the LMM on device, we achieve enhanced privacy, reliability, personalization, and cost.

Visit the Qualcomm AI Research website
https://www.qualcomm.com/research/artificial-intelligence/ai-research

Develop with the Qualcomm AI Stack
https://www.qualcomm.com/products/technology/artificial-intelligence/ai-stack

Sign up for our newsletter
https://assets.qualcomm.com/mobile-computing-newsletter-sign-up.html

Видео World’s first large multimodal model (LMM) with audio reasoning on a Windows PC канала Qualcomm Research