There are over 7,000 languages spoken in the world, yet most AI chatbots are trained on only around 100 of them. And English, despite being spoken by less than 20% of the world’s population, accounts for almost two-thirds of websites and is the main driver of LLMs. Despite the gradual emergence of Multilingual Language Models (MLMs), they “are still usually trained disproportionately on English language text and thus end up transferring values and assumptions encoded in English into other language contexts where they may not belong”. They give the example of the word “dove”, which an MLM might interpret in various languages as being associated with peace, but the Basque equivalent (“uso”) is in fact an insult. What’s needed is the development of non-English Natural Language Processing (NLP) applications, say experts, to help reduce the language bias in generative AI and “preserve cultural heritage”. Governments and the tech community and even individuals are taking steps to resolve the AI language issue. For example, the Indian government is building Bhashini, an AI translation system trained on local languages. In New Zealand, local broadcaster Te Hiku Media is harnessing AI to aid the “preservation, promotion and revitalization of te reo Māori.” In a similar endeavour, grassroots organization Masakhane is working to “strengthen and spur NLP research in African languages”. With a language “disappearing” at a rate of one every fortnight, according to UNESCO, generative AI could prove to be the death knell, or the saviour, of many of them.🗣️ Source: WeForum.org #AI #GenAI #multilanguageAI
TELUS International AI Data Solutions’ Post
More Relevant Posts
-
🔥 6x LinkedIn Top Voice | Sr AWS AI ML Solution Architect at IBM | Generative AI Expert | Author - Hands-on Time Series Analytics with Python | IBM Quantum ML Certified | 12+ Years in AI | MLOps | IIMA | 100k+Followers
𝗖𝗵𝗮𝗶𝗻-𝗼𝗳-𝗧𝗿𝗮𝗻𝘀𝗹𝗮𝘁𝗶𝗼𝗻 𝗣𝗿𝗼𝗺𝗽𝘁𝗶𝗻𝗴 𝗷𝘂𝘀𝘁 𝘀𝗼𝗹𝘃𝗲𝗱 𝘁𝗵𝗲 #𝟭 𝗰𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲 𝗳𝗼𝗿 𝗡𝗟𝗣 𝗶𝗻 𝗹𝗼𝘄-𝗿𝗲𝘀𝗼𝘂𝗿𝗰𝗲 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲𝘀 (𝘀𝗲𝗲 𝗵𝗼𝘄 𝗶𝘁 𝗯𝗼𝗼𝘀𝘁𝗲𝗱 𝗵𝗮𝘁𝗲 𝘀𝗽𝗲𝗲𝗰𝗵 𝗱𝗲𝘁𝗲𝗰𝘁𝗶𝗼𝗻) 🧠 Working with low-resource languages like Marathi has always been a hurdle in Natural Language Processing (NLP). These languages often lack the data and resources necessary for traditional models to perform effectively, creating a significant barrier to inclusive AI. But what if we could unlock the power of large language models (LLMs) for even the most underrepresented languages? 🚀 𝗪𝗵𝘆 𝗖𝗼𝗧𝗥? Enter 𝗖𝗵𝗮𝗶𝗻-𝗼𝗳-𝗧𝗿𝗮𝗻𝘀𝗹𝗮𝘁𝗶𝗼𝗻 𝗣𝗿𝗼𝗺𝗽𝘁𝗶𝗻𝗴 (𝗖𝗼𝗧𝗥)—a game-changing strategy that transforms how language models handle low-resource languages. We’ve found that translation-based prompting isn’t just a workaround; it’s a solution. Especially in tasks like 𝗵𝗮𝘁𝗲 𝘀𝗽𝗲𝗲𝗰𝗵 𝗱𝗲𝘁𝗲𝗰𝘁𝗶𝗼𝗻, 𝗖𝗼𝗧𝗥 delivers unprecedented improvements. 🔍 𝗛𝗼𝘄 𝗜𝘁 𝗪𝗼𝗿𝗸𝘀: 𝗖𝗼𝗧𝗥 restructures a prompt in 3 key steps: 1. 𝗧𝗿𝗮𝗻𝘀𝗹𝗮𝘁𝗶𝗼𝗻: The input context from a low-resource language (e.g., Marathi) is translated into a higher-resource language, like English. 2. 𝗧𝗮𝘀𝗸 𝗘𝘅𝗲𝗰𝘂𝘁𝗶𝗼𝗻: The task (such as classification or text generation) is performed on the translated text. 3. 𝗢𝘂𝘁𝗽𝘂𝘁 𝗧𝗿𝗮𝗻𝘀𝗹𝗮𝘁𝗶𝗼𝗻: If needed, the output is translated back into the original language. All this happens seamlessly in one single prompt. It’s not just translation—it’s optimizing the entire process. 📊 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗔𝗻𝗮𝗹𝘆𝘀𝗶𝘀: In a comparative study, 𝗖𝗼𝗧𝗥 outperformed regular prompting methods, particularly in 𝗵𝗮𝘁𝗲 𝘀𝗽𝗲𝗲𝗰𝗵 𝗱𝗲𝘁𝗲𝗰𝘁𝗶𝗼𝗻, where it delivered the highest accuracy boost. But the benefits extend beyond that, enhancing tasks like sentiment analysis and subject classification too. 𝗣.𝗦. This breakthrough doesn’t just improve accuracy—it opens the door to better synthetic data generation for underrepresented languages, offering a glimpse into the future of truly multilingual AI. #LLMs #DataScience #Language
To view or add a comment, sign in
-
The rapid development of Large Language Models (LLMs) demonstrates remarkablemultilingual capabilities in natural language processing, attracting globalattention in both academia and industry. To mitigate potential discriminationand enhance the overall usability and accessibility for diverse language usergroups, it is important for the development of language-fair technology.Despite the breakthroughs of LLMs, the investigation into the multilingualscenario remains insufficient, where a comprehensive survey to summarize recentapproaches, developments, limitations, and potential solutions is desirable. Tothis end, we provide a survey with multiple perspectives on the utilization ofLLMs in the multilingual scenario. We first rethink the transitions betweenprevious and current research on pre-trained language models. Then we introduceseveral perspectives on the multilingualism of LLMs, including training andinference methods, model security, multi-domain with language culture, andusage of datasets. We also discuss the major challenges that arise in theseaspects, along with possible solutions. Besides, we highlight future researchdirections that aim at further enhancing LLMs with multilingualism. The surveyaims to help the research community address multilingual problems and provide acomprehensive understanding of the core concepts, key techniques, and latestdevelopments in multilingual natural language processing based on LLMs. #LLMs #Multilingual #NLP #LanguageTechnology #ResearchCommunity
To view or add a comment, sign in
-
The rapid development of Large Language Models (LLMs) demonstrates remarkablemultilingual capabilities in natural language processing, attracting globalattention in both academia and industry. To mitigate potential discriminationand enhance the overall usability and accessibility for diverse language usergroups, it is important for the development of language-fair technology.Despite the breakthroughs of LLMs, the investigation into the multilingualscenario remains insufficient, where a comprehensive survey to summarize recentapproaches, developments, limitations, and potential solutions is desirable. Tothis end, we provide a survey with multiple perspectives on the utilization ofLLMs in the multilingual scenario. We first rethink the transitions betweenprevious and current research on pre-trained language models. Then we introduceseveral perspectives on the multilingualism of LLMs, including training andinference methods, model security, multi-domain with language culture, andusage of datasets. We also discuss the major challenges that arise in theseaspects, along with possible solutions. Besides, we highlight future researchdirections that aim at further enhancing LLMs with multilingualism. The surveyaims to help the research community address multilingual problems and provide acomprehensive understanding of the core concepts, key techniques, and latestdevelopments in multilingual natural language processing based on LLMs. #LLMs #Multilingual #NLP #LanguageTechnology #ResearchCommunity
To view or add a comment, sign in
-
AI’s Blind Spot: Arabic in the Digital Age By: Sabriya El Mengad, a specialist in business intelligence here has been a rapid advancement in Artificial Intelligence (AI) in the past year. The release of the pioneering AI language model, ChatGPT, has ushered in a revolution. AI has boosted efficiency and transformed user experience across many industries. However, these changes have mostly benefited countries where the native language is adequately supported by the AI. English is the language of choice for most AI tools, stranding speakers of languages such as Arabic on the periphery of this latest tech revolution. This has deepened the already severe digital divide between English and Arabic speakers, and further entrenched disparities in access to information. The potential of AI to revolutionise communication, content creation, research, and information retrieval has been widely touted. Nevertheless, Arabic speakers find themselves at a disadvantage due to AI tools’ subpar performance in their language. In order to produce human-like text, AI tools use Natural Language Processing (NLP), a method used to train chatbots to generate text responses. NLP involves breaking text into words or phrases (tokenization), analysing grammar (parsing), and processing the meaning of text through semantic analysis using machine learning techniques. English is a well-studied language and developers can draw upon a sizeable pool of material, making NLP simpler for English than for other languages. The disparity is especially clear in the case of Arabic, which has received far less attention from developers. for more: https://n9.cl/ay6o5 Tamooda
To view or add a comment, sign in
-
𝐒𝐨𝐥𝐯𝐢𝐧𝐠 𝐌𝐮𝐥𝐭𝐢𝐥𝐢𝐧𝐠𝐮𝐚𝐥 𝐓𝐚𝐬𝐤𝐬 𝐰𝐢𝐭𝐡 𝐋𝐚𝐫𝐠𝐞 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐬 𝐄𝐧𝐠𝐥𝐢𝐬𝐡-𝐅𝐨𝐜𝐮𝐬𝐞𝐝 𝐋𝐋𝐌𝐬 Large Language Models (LLMs) show impressive multilingual abilities, but they remain predominantly focused on English because of the way they are trained. 𝐓𝐫𝐚𝐧𝐬𝐥𝐚𝐭𝐢𝐨𝐧 𝐟𝐨𝐫 𝐁𝐞𝐭𝐭𝐞𝐫 𝐍𝐋𝐏 For a variety of natural language processing (NLP) tasks, translating text into English first can improve performance with English-centric LLMs. 𝐋𝐢𝐦𝐢𝐭𝐚𝐭𝐢𝐨𝐧𝐬 𝐟𝐨𝐫 𝐂𝐮𝐥𝐭𝐮𝐫𝐞-𝐒𝐞𝐧𝐬𝐢𝐭𝐢𝐯𝐞 𝐓𝐚𝐬𝐤𝐬 While translation-based improvement works for some NLP applications, it's not ideal when dealing with tasks requiring deep understanding of cultural nuances. In those cases, using the native language for prompts is more effective. 𝐓𝐡𝐞 𝐍𝐞𝐞𝐝 𝐟𝐨𝐫 𝐓𝐫𝐮𝐥𝐲 𝐌𝐮𝐥𝐭𝐢𝐥𝐢𝐧𝐠𝐮𝐚𝐥 𝐋𝐋𝐌𝐬 The authors argue that we need to push for the development of language models that are genuinely multilingual, not simply English models with translation workarounds. Paper details in the comments. #llms #generativeai #datascience #multilingual #nlproc #deeplearning
To view or add a comment, sign in
-
Professor at UiT The Arctic University of Norway, AI Book Author, Kauffman Global Scholar, NTU, IIT Dhanbad, Top 2% Scientist Stanford Univ. List, MedTech and AI startup co-founder, Advisor & Mentor for AI startups
🚀 Exciting research alert! Towards a more inclusive AI. Sámi language speakers are reducing with every passing time even though Scandic Countries and other organization are trying to protect and promote the language with great zeal, effort and money. This Sami language LLM related groundwork creation is an outcome of hard work by Ronny Paul and Himanshu Buckchash with Low resource language Expert Shantipriya Parida (Silo AI ). Presenting Towards a More Inclusive AI: Progress and Perspectives in Large Language Model Training for the Sámi Language : https://lnkd.in/g344W7VP We're shining a spotlight on Ultra Low Resource (ULR) languages with a particular focus on the Sámi, an indigenous language group facing digital marginalization. 🌍 Our research demonstrates the importance of inclusive technology, especially for languages like Sámi, which are underrepresented in large-scale AI language models. Here's what we did: 📚 Compiled available Sámi language resources to create a clean dataset. 🛠️ Experimented with advanced language models (~7B parameters) to analyze their behavior with ULR languages. 🌐 Explored multilingual training scenarios to uncover the best strategies for training ULR language models. Key Takeaway: Our findings reveal that sequential multilingual training for decoder-only models performs best for these languages. The study opens a new chapter for ULR languages by bringing their unique needs into the limelight, showcasing how inclusion and advancement in NLP can work together. Lets work together to a a more inclusive AI where No Language Left Behind AI at Meta. Visual Intelligence NorwAI Integreat - Norwegian Centre for Knowledge-driven Machine Learning NORA – The Norwegian Artificial Intelligence Research Consortium Norwegian Open AI Lab Centre for Digital Life Norway Klas Pettersen Kristian Kersting Kerstin Bach Anis Yazidi Pedro G. Lind Samuel Kaski Alex Moltzau
To view or add a comment, sign in
-
Software Engineer |💡Top AI Voice | Python, Django, Flask & FastAPI | AI Explorer with LLMs, Vector DB, LangChain | SQL & MongoDB
Unlocking Language Diversity with LangChain: Revolutionizing AI Communication 🚀 Introducing LangChain: In the ever-evolving landscape of artificial intelligence (AI), one of the most significant challenges has been facilitating seamless communication across diverse languages. Enter LangChain, a groundbreaking solution poised to revolutionize how AI interacts and communicates across linguistic barriers. 🌐 Breaking Down Language Barriers: At its core, LangChain employs advanced natural language processing (NLP) algorithms and deep learning techniques to decode and encode languages with unparalleled accuracy. Whether it's translating text, interpreting speech, or generating content, LangChain seamlessly bridges the gap between languages, opening up a world of possibilities for global communication. 💡 Key Features and Benefits: 🔹 Multilingual Communication: LangChain enables AI systems to communicate fluently in multiple languages, catering to diverse audiences and markets without the need for extensive manual translation. 🔹 Contextual Understanding: By leveraging contextual cues and semantic analysis, LangChain goes beyond literal translations to capture the nuances and subtleties of language, ensuring more accurate and natural communication. 🔹 Continuous Learning: Through reinforcement learning and data-driven insights, LangChain adapts and improves over time, constantly refining its understanding of languages and evolving alongside linguistic patterns and trends. 🔹 Scalability and Efficiency: Designed with scalability in mind, LangChain offers efficient processing capabilities, making it suitable for a wide range of applications, from chatbots and virtual assistants to content generation and sentiment analysis. 🌍 Empowering Global Innovation: In an increasingly interconnected world, effective communication is paramount for driving innovation, fostering collaboration, and transcending cultural boundaries. With LangChain, organizations can unlock new opportunities, reach untapped markets, and engage with audiences on a deeper level, regardless of language differences. #LangChain #AI #NLP #LanguageTechnology #Innovation #GlobalCommunication 🌐🔗🚀
To view or add a comment, sign in
282,952 followers
More from this author
-
Is Academic-Work Life Balance a Myth?
TELUS International AI Data Solutions 2mo -
Digital Balance: Navigating Mental Health for Remote and Crowd Workers
TELUS International AI Data Solutions 10mo -
Unlocking the Power of AI: Learn About the Role of Language Data Analyst
TELUS International AI Data Solutions 11mo