EPISODE 15: How PAWA Is Building The First African Language Foundation Model, with Michael Mollel

Swahili is no longer a niche edge case in artificial intelligence, it is becoming a stress test for whether AI systems can truly serve Africa at scale. In this episode of Cause Effect 4.0, host Celina Lee explores how language, identity and technology intersect, as global models continue to expand while many African languages remain underrepresented despite being spoken by hundreds of millions of people. But what does it really mean for a language to be “underrepresented” in AI systems, and who gets to decide what counts as enough data?

She is joined by Dr. Michael Mollel, a Tanzanian AI researcher and entrepreneur whose journey began on Zindi, Africa’s largest data science competition platform. He recalls starting out as a master’s student experimenting with early machine learning, later joining Zindi where he first realised how competitive and fast-moving applied AI in Africa had become. If competitions are where talent is tested, how much do they actually shape the builders who go on to create real-world systems? What happens when leaderboard success turns into production responsibility?

That competitive environment also became the foundation for collaboration. Through Zindi, Mollel connected with fellow Tanzanian data scientists, including his eventual co-founder, and began moving from competitions to production systems. That collaboration evolved into Sartify and PAWA (Pan-African Wide Alignment Language Model), an initiative focused on building African-language AI systems grounded in local data and use cases rather than adapting imported models with limited cultural awareness. But can locally built models truly compete with global systems, or is the goal something entirely different?

Mollel explains that the core motivation behind PAWA is the persistent failure of mainstream models to handle African linguistic context, even in widely spoken languages like Swahili. Why do models that perform well in English still struggle with humour, idioms, and cultural nuance in African languages? He points to issues like dialect variation and underrepresentation in training data, raising a deeper question: if a language is missing from the training corpus, is it effectively invisible to the model?

To address this, PAWA has built a large-scale dataset spanning tens of billions of tokens across roughly 20 African languages, with Swahili accounting for a significant share. The data has been collected over several years using sources such as newspapers, web scraping, and speech-to-text conversion from YouTube content. But how do you ensure quality and cultural accuracy when your data comes from such diverse and uneven sources? With support from programmes like Mozilla Builders and access to GPU infrastructure, the team has trained models under the Pan-African Wide Alignment Language Model framework—but is infrastructure still the biggest bottleneck?

Despite progress, Mollel is clear that infrastructure and data scarcity remain the biggest constraints. He argues that African AI development is less a purely scientific challenge and more an economic and resource problem, requiring collaboration across institutions and sectors. So if the future is not one universal model but many specialised systems, how do we decide which problems to prioritise first, agriculture, fintech, education, or something else entirely?

WATCH.

Видео EPISODE 15: How PAWA Is Building The First African Language Foundation Model, with Michael Mollel канала The Cause Effect 4.0 Podcast

Комментарии отсутствуют

Информация о видео

23 мая 2026 г. 14:02:28

00:54:52

The Cause Effect 4.0 Podcast

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Другие видео канала

EPISODE 15: How PAWA Is Building The First African Language Foundation Model, with Michael Mollel

EPISODE 12: What Can Small Island Nations Teach the World About AI? with Dr Craig Ramlal

EPISODE 3: Can Tanzania’s AI Community Turn Ambition Into Impact? with Essa Mohamedali

EPISODE 10: Where Does Africa Truly Stand in AI Governance? with Amb. Philip Thigo

EPISODE 4: Who Owns Africa’s Digital Future? with Qhala CEO Shikoh Gitau

EPISODE 14: UNIDO Envisions A Neural Network of Nations for Global AI Inclusion, with Jason Slater

EPISODE 9: Will Africa’s AI Future Be Defined by Locals or Foreigners? with Kate Kallot

EPISODE 2: What Does It Take To Build Climate Resilience With AI In Africa? with Prof Rendani Mbuvha

EPISODE 6: What Role Do Small Language Models Play in Africa's AI Future? with Dr Bayo Adekanmbi

EPISODE 11: Africa Could Teach Silicon Valley a Lesson About AI Efficiency, with Dr Keita Broadwater

EPISODE 8: How Community-Driven AI Could Shape Africa; Interview with Rose Delilah Gesicho

EPISODE 7: Forget AGI, the Real Revolution is in Applied AI. Interview with Juan Lavista Ferres

EPISODE 13: Google’s WAXAL Project Wants AI to Understand Africa, with Perry Nelson

EPISODE 1: Can Africa Seize The AI Moment? with Futurist Alex Tsado

EPISODE 5: Can AI Truly Be Democratized Across the MENA Region? with Christophe Zoghbi