POSIX Extended Regular Expression Syntax
Although UTR-18 Version 6 suggests use of a Perl-like pattern syntax, the regular expression support in the Internationalization Module uses the POSIX 2 extended regular expression (ERE) pattern syntax, with Unicode extensions, suggested by UTR-18 Version 5.1. That syntax is described in Table 161.
The special characters used by RWURegularExpression are as follows:
|
Character |
Meaning |
|
+ |
Matches one or more occurrences of the preceding item, except in a bracket expression. For example, |
|
* |
Matches zero or more occurrences of the preceding item, except in a bracket expression. For example, |
|
? |
Matches zero or one occurrence(s) of the preceding item, except in a bracket expression. For example, |
|
{ and } |
Specify a cardinality range, formed as follows: This construct can also be formed using Note: |
|
[ and ] |
Create a bracket expression. Bracket expressions create a set of items, any of which may be matched. For example, Within a bracket expression all regular expression special characters are treated as normal, non-special characters, except::
Finally, in order to include a |
|
( and ) |
Group regular expression items into subexpressions, which are treated as a single unit. For example, whereas |
|
\ |
Escapes a regular expression character, causing it to be treated as a regular character. For example, whereas Note: To specify the |
|
^ |
Indicates that a regular expression or subexpression is anchored at the beginning of the input string. For example, |
|
$ |
Indicates that a regular expression or subexpression is anchored at the end of the input string. For example, |
|
| |
Denotes alternation, or the creation of a set of equally valid, alternate expressions or subexpressions, each of which can be matched. For example, |
|
. |
Matches any code unit, except for those which indicate the logical end of a line, as outlined in Unicode Technical Report #18: |