TELUS International AI Data Solutions’ Post

There are over 7,000 languages spoken in the world, yet most AI chatbots are trained on only around 100 of them. And English, despite being spoken by less than 20% of the world’s population, accounts for almost two-thirds of websites and is the main driver of LLMs. Despite the gradual emergence of Multilingual Language Models (MLMs), they “are still usually trained disproportionately on English language text and thus end up transferring values and assumptions encoded in English into other language contexts where they may not belong”. They give the example of the word “dove”, which an MLM might interpret in various languages as being associated with peace, but the Basque equivalent (“uso”) is in fact an insult. What’s needed is the development of non-English Natural Language Processing (NLP) applications, say experts, to help reduce the language bias in generative AI and “preserve cultural heritage”. Governments and the tech community and even individuals are taking steps to resolve the AI language issue. For example, the Indian government is building Bhashini, an AI translation system trained on local languages. In New Zealand, local broadcaster Te Hiku Media is harnessing AI to aid the “preservation, promotion and revitalization of te reo Māori.” In a similar endeavour, grassroots organization Masakhane is working to “strengthen and spur NLP research in African languages”. With a language “disappearing” at a rate of one every fortnight, according to UNESCO, generative AI could prove to be the death knell, or the saviour, of many of them.🗣️ Source: WeForum.org #AI #GenAI #multilanguageAI

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics