How to Create an RWURegularExpression

SourcePro Core : Internationalization Module User’s Guide : Pattern Matching : Regular Expression String Searching : How to Create an RWURegularExpression

RWURegularExpression objects are constructed from pattern strings. The pattern string can be a string literal, an RWCString, or an RWUString. For example, this code creates an RWURegularExpression that could be used to search for a bold item encoded in the ASCII range of characters in an HTML document:

RWUConversionContext context("ascii");

RWUString pattern("<b>([\\u0020-\\u007f]*)</b>");

RWURegularExpression r(pattern);

If an RWURegularExpression is constructed from a string literal or RWCString, the pattern data is expected to be NULL-terminated, and is converted to Unicode using the given converter. (See Chapter 4 for more information on converting between encodings.) If no converter is supplied, the converter managed by the current to-Unicode conversion context is used. Any escape sequences are unescaped.

Other optional arguments to the constructors include:

Options for pattern matching; currently only caseless matches are supported

The level of Unicode regular expression conformance, either basic or tailored (See Unicode Regular Expressions for more information on supported levels.)

The converter to use for character conversion

The locale to use (See Chapter 10 for more information on locales.)

The regular expression instance uses the locale to determine locale-specific behavior in a tailored regular expression (Locales have little effect on basic regular expressions). Grapheme clusters, character sets, and the break locations for words, sentences and lines may change depending on locale. For example, the Spanish character "ch" is found in the character set "[b-d]" in Spanish locales, but not in English.

NOTE: You may also set the locale using the setLocale() method.

For example, the following code creates an RWURegularExpression that could be used to search for the characters abc at the end of line, without regard to case:

RWUConversionContext context("ascii");

RWUString pattern("abc$");

RWURegularExpression r(pattern,

RWURegularExpression::Basic,

RWURegularExpression::IgnoreCase);

Similarly, this pattern uses character categories to search for line breaks in accordance with the conventions of the zh_TW locale:

RWUString

pattern("^[{L}{Zs}]+[{BOL}][{L}{Zs}]+[{EOL}][{L}{Zs}]+$");

RWURegularExpression r(pattern,

RWURegularExpression::Basic,

RWURegularExpression::Normal,

RWULocale(zh_TW));