Starting in 1999, UTS #18: Unicode Regular Expressions has supplied guidelines and conformance levels for supporting Unicode in regular expressions. The new version 21 broadens the scope of properties for regular expressions (regex) to allow for properties of strings (such as for emoji sequences). For example, the following matches all emoji flags except the French flag:
/[\p{RGI_Emoji_Flag_Sequence}--\q{🇫🇷}]/
Among the improvements are:- Provides a new Annex D: Resolving Character Classes with Strings for handling negations of sets of strings.
- Updates the full property list to include the latest UCD properties, plus Emoji properties and UTS #39 properties.
- Removes obsolete text passages, and makes editorial changes for clarity.
Over 140,000 characters are available for adoption
to help the Unicode Consortium’s work on digitally disadvantaged languages