General Character Categories
Every Unicode character is also assigned to a general character category in the Unicode Character Database.
RWUCharTraits provides a
GeneralCategory enum with values that identify the various categories, such as
UppercaseLetter,
LowercaseLetter,
DecimalDigitNumber,
LineSeparator,
ConnectorPunctuation, and so on. The values in this enumeration correspond to the general category property codes that appear in the Unicode Character Database, as described in:
http://www.unicode.org/reports/tr44/
The static method
RWUCharTraits::getGeneralCategory() returns the value in the
GeneralCategory enumeration that identifies the general character category associated with a given code point. Various convenience methods are also provided, which return
true if a given
RWUChar32 represents a code point in a particular character category:
RWUCharTraits::isControl(),
RWUCharTraits::isError(),
RWUCharTraits::isLetter(),
RWUCharTraits::isPunctuation(),
RWUCharTraits::isSpace(), and
RWUCharTraits::isWhitespace(). The static method
getWhitespace() returns a null-terminated array of whitespace code points, as a convenience for use as delimiters (see
Tokenizing).