Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.
Version 42 is focusing on:
-
Additional Coverage
- Unicode 15.0 additions: new emoji, script names, collation data (Chinese & Japanese), …
- New Languages: Adding Haryanvi, Bhojpuri, Rajasthani at a Basic level.
- Up-leveling: Xhosa, Hinglish (Hindi-Latin), Nigerian Pidgin, Hausa, Igbo, Yoruba, and Norwegian Nynorsk.
- Person Name Formatting: for handling the wide variety in the way that people’s names work in different languages.
- People may have a different number of names, depending on their culture--they might have only one name (“Zendaya”), two (“Albert Einstein”), or three or more.
- People may have multiple words in a particular name field, eg “Mary Beth” as a given name, or “van Berg” as a surname.
- Some languages, such as Spanish, have two surnames (where each can be composed of multiple words).
- The ordering of name fields can be different across languages, as well as the spacing (or lack thereof) and punctuation.
- Name formatting need to be adapted to different circumstances, such as a need to be presented shorter or longer; formal or informal context; or when talking about someone, or talking to someone, or as a monogram (JFK).
Each new locale starts with a small set of Core data, such as a list of characters used in the language. Submitters of those locales need to bring the coverage up to Basic level (very basic basic dates, times, numbers, and endonyms) during the next submission cycle. In version 41, the following levels were reached:
Level | Languages | Locales* | Notes |
Modern | 89 | 361 | Suitable for full UI internationalization |
Afrikaans, … Čeština, … Dansk, … Eesti, … Filipino, … Gaeilge, … Hrvatski, Indonesia, … Jawa, Kiswahili, Latviešu, … Magyar, …Nederlands, … O‘zbek, Polski, … Română, Slovenčina, … Tiếng Việt, … Ελληνικά, Беларуская, … ᏣᎳᎩ, Ქართული, Հայերեն, עברית, اردو, … አማርኛ, नेपाली, … অসমীয়া, বাংলা, ਪੰਜਾਬੀ, ગુજરાતી, ଓଡ଼ିଆ, தமிழ், తెలుగు, ಕನ್ನಡ, മലയാളം, සිංහල, ไทย, ລາວ, မြန်မာ, ខ្មែរ, 한국어, … 日本語, … | |||
Moderate | 13 | 32 | Suitable for full “document content” internationalization, such as formats in a spreadsheet. |
Binisaya, … Èdè Yorùbá, Føroyskt, Igbo, IsiZulu,
Kanhgág, Nheẽgatu, Runasimi, Sardu, Shqip, سنڌي, … |
|||
Basic | 22 | 21 | Suitable for locale selection, such as choice of language in mobile phone settings. |
Asturianu, Basa Sunda, Interlingua, Kabuverdianu,
Lea Fakatonga, Rumantsch, Te reo Māori, Wolof, Босански
(Ћирилица), Татар, Тоҷикӣ, Ўзбекча (Кирил), کٲشُر, कॉशुर
(देवनागरी), …, মৈতৈলোন্, ᱥᱟᱱᱛᱟᱲᱤ, 粤语 (简体) |
If you would like to contribute missing data for your language, see Survey Tool Accounts. For more information on contributing to CLDR, see the CLDR Information Hub.
Over 144,000 characters are available for adoption
to help the Unicode Consortium’s work on digitally disadvantaged languages