Beyond English-Centric AI: Lessons on Community Participation from Non-English NLP Groups

Beyond English-Centric AI: Lessons on Community Participation from Non-English NLP Groups

By: CDT Summer Fellow Evani Radiya-Dixit and AI Gov Lab Director Miranda Bogen

Many leading language models are trained on nearly a thousand times more English text compared to text in other languages. These disparities in large language models have real-world impacts, especially for racialized and marginalized communities. For example, they have resulted in inaccurate medical advice in Hindi, led to wrongful arrest because of mistranslations in Arabic, and have been accused of fueling ethnic cleansing in Ethiopia due to poor moderation of speech that incites violence.

These harms reflect the English-centric nature of natural language processing (NLP) tools, which prominent tech companies often develop without centering or even involving non-English-speaking communities. In response, region- and language-specific research groups, such as Masakhane and AmericasNLP, have emerged to counter English-centric NLP by empowering their communities to both contribute to and benefit from NLP tools developed in their languages. Based on our research and conversations with these collectives, we outline promising practices that companies and research groups can adopt to broaden community participation in multilingual AI development.

Read the full brief.

Rick Gillespie

Wittgenstein's Language Games Framework🕊️🕊️🕊️

3mo

You should review my methodology as a foundational requirement. IT uses Aristotle, Socrates and Wittgenstein(Not Greek, German). None of which are English speaking. The Rosetta stone had Greek so I would think it is universal. Langauge Games are the key.

To view or add a comment, sign in

More articles by Center for Democracy & Technology

Insights from the community

Others also viewed

Explore topics