Regular Expression String Searching
A regular expression is a string pattern composed of normal characters and special characters. Special characters are used to denote an arrangement of the other characters in the regular expression pattern. A regular expression can be used to search for, and perhaps replace, occurrences of the regular expression pattern in strings.
Regular expression syntax describes how to arrange normal characters and special characters to form a valid regular expression pattern. The regular expression syntax for
RWURegularExpression is similar to that of the POSIX 2 extended regular expression (ERE) specification, in addition to Unicode extensions. For more information on the POSIX ERE standard, see
POSIX Extended Regular Expression Syntax.The Internationalization Module extends the POSIX 2 ERE syntax to provide support for Unicode basic and tailored regular expressions through the class
RWURegularExpression.
Basic Unicode regular expression support corresponds to Level 1 support, as described in the Unicode Regular Expression Guidelines (Unicode Technical Report #18 (UTR-18) Version 5.1 at http://www.unicode.org/reports/tr18/tr18-5.1.html). Basic Unicode regular expressions are useful for the majority of Unicode strings. They add the following Unicode extensions to the POSIX ERE standard:
Hexadecimal notation
Character categories
Subtraction
Simple word boundaries
Simple loose matches
Line breaks
For more information on basic regular expressions, see
Basic Unicode Regular Expression Extensions.Tailored regular expressions extend the basic regular expression functionality, corresponding to Level 2 and Level 3 support, also described in UTR-18 Version 5.1. In addition to some minor extensions, the tailored extensions include support for:
Treating surrogate pairs as single characters
Using the script property
Matching canonically equivalent character representations
Specifying grapheme clusters
As always, added power comes at a cost in processing time and space, so if you don't need the power of tailored regular expressions, the default behavior of
RWURegularExpression is to use the basic regular expression engine.