Character | Meaning |
---|---|
+ | Matches one or more occurrences of the preceding item, except in a bracket expression. For example, a+ matches a, aa, aaa, and so on. |
* | Matches zero or more occurrences of the preceding item, except in a bracket expression. For example, a* matches the empty string, a, aa, and so on. |
? | Matches zero or one occurrence(s) of the preceding item, except in a bracket expression. For example, a? matches the empty string and a. |
{ and } | Specify a cardinality range, formed as follows: {m,n}. This construct matches between m and n occurrences of the preceding item. For example, a{2,3} matches aa and aaa. This construct can also be formed using {m,} and {m}. The first matches m or more occurrences of the preceding item. For example, a{2,} matches aa, aaa, aaaa, and so on. The second matches exactly m occurrences of the preceding item. For example, a{2} matches aa. Note: { is treated differently in a bracket expression. In this context, { denotes the beginning of a Unicode character category, as described in Unicode Regular Expressions. |
[ and ] | Create a bracket expression. Bracket expressions create a set of items, any of which may be matched. For example, [abc] matches a, or b, or c. Within a bracket expression all regular expression special characters are treated as normal, non-special characters, except:: - specifies a range of character values, based on their bit pattern. For example, [A-Za-z] matches all uppercase and lowercase English characters. To indicate - as a character in the bracket expression, it must be the first or last character in the set; for example, [-a-z] or [A-Z-]. ^ is special only when placed in the first character position within the bracket set. Using ^ in the first position complements the set of items to be matched. For example, [^a-z] matches all characters except for lowercase English letters. { denotes the beginning of a Unicode character category (see Unicode Regular Expressions). To use { in a bracket expression, escape it by preceding it with the \ character as follows: [\{]. Finally, in order to include a ] as a character in the bracket set, you must include it as the first character in the set, as in []abc] or [^]abc]. |
( and ) | Group regular expression items into subexpressions, which are treated as a single unit. For example, whereas ab* matches a, ab, abb, and so on, (ab)* matches the empty string, ab, abab, and so on. ( and ) are not treated as special characters inside a bracket expression. |
\ | Escapes a regular expression character, causing it to be treated as a regular character. For example, whereas (ab) indicates a subexpression consisting of ab, \(ab\) denotes the sequence of characters (, a, b, and ). Note: To specify the \ character in C++ source code, you must specify \\, as the C++ compiler treats the \ character as special, denoting the beginning of an escape sequence embedded in the C++ source code. In data files, or text controls in dialog boxes, however, the double backslash is not necessary. |
^ | Indicates that a regular expression or subexpression is anchored at the beginning of the input string. For example, ^ab matches ab and abc, but not cab. Recall that ^ is treated differently in bracket expressions. |
$ | Indicates that a regular expression or subexpression is anchored at the end of the input string. For example, ab$ matches ab and cab, but not abc. |
| | Denotes alternation, or the creation of a set of equally valid, alternate expressions or subexpressions, each of which can be matched. For example, ab|cd matches ab or cd. |
. | Matches any code unit, except for those which indicate the logical end of a line, as outlined in Unicode Technical Report #18: \u2028, \u2029, \u000A, \u000B, \u000C, \u000D, \u0085. |