Regular Expression String Searching
A regular expression is a string pattern composed of normal characters and special characters. Special characters are used to denote an arrangement of the other characters in the regular expression pattern. A regular expression can be used to search for, and perhaps replace, occurrences of the regular expression pattern in strings.
Regular expression syntax describes how to arrange normal characters and special characters to form a valid regular expression pattern. The regular expression syntax for RWURegularExpression is similar to that of the POSIX 2 extended regular expression (ERE) specification, in addition to Unicode extensions. For more information on the POSIX ERE standard, see POSIX Extended Regular Expression Syntax.
The Internationalization Module extends the POSIX 2 ERE syntax to provide support for Unicode basic and tailored regular expressions through the class RWURegularExpression.
Basic Unicode regular expression support corresponds to Level 1 support, as described in the Unicode Regular Expression Guidelines (Unicode Technical Report #18 (UTR-18) Version 5.1 at http://www.unicode.org/reports/tr18/tr18-5.1.html). Basic Unicode regular expressions are useful for the majority of Unicode strings. They add the following Unicode extensions to the POSIX ERE standard:
*Hexadecimal notation
*Character categories
*Subtraction
*Simple word boundaries
*Simple loose matches
*Line breaks
For more information on basic regular expressions, see Basic Unicode Regular Expression Extensions.
Tailored regular expressions extend the basic regular expression functionality, corresponding to Level 2 and Level 3 support, also described in UTR-18 Version 5.1. In addition to some minor extensions, the tailored extensions include support for:
*Treating surrogate pairs as single characters
*Using the script property
*Matching canonically equivalent character representations
*Specifying grapheme clusters
As always, added power comes at a cost in processing time and space, so if you don't need the power of tailored regular expressions, the default behavior of RWURegularExpression is to use the basic regular expression engine.
For more information on tailored regular expressions, see Tailored Unicode Regular Expression Extensions and How to Use Tailored Regular Expressions.