Character Properties

SourcePro Core : Internationalization Module User’s Guide : Character and String Processing : Character Properties

One of the strengths of the Unicode Standard is that it not only defines a very large character set, but also defines a comprehensive set of properties for each code point in the Unicode character set. The set of properties and the values of those properties are specified by the Unicode Character Database that is published as part of the Unicode Standard:

http://www.unicode.org/ucd

The Unicode Character Database consists of a number of data files. The latest versions of all data files are available here:

http://www.unicode.org/Public/UNIDATA/

Unicode character properties may be either normative or informative, as defined in Chapter 3, "Conformance," of the Unicode Standard:

A normative property is required for conformance with the Unicode Standard. Implementations that claim conformance to the Unicode Standard and that make use of a particular normative property must follow the specifications of the standard for that property to be conformant.

An informative property is strongly recommended, but a conformant implementation is free to use or change such values as it may require, while still remaining conformant to the standard.

In the Internationalization Module, RWUCharTraits provides access to Unicode character properties. This class defines several public enums that name property values in plain English, and a series of static methods for querying the properties of a character. For example, the static method RWUCharTraits::getScript() returns an enumerated value identifying the script property of the Unicode character with the given code point: such as Latin, Greek, Hebrew, Arabic, or Han. Most methods on RWUCharTraits take RWUChar32 code points as arguments; a few operate on RWUChar16 code units. RWUCharTraits provides access to both normative and informative properties of characters.

It is not necessary to instantiate class RWUCharTraits. All its methods are static.