SourcePro : Internationalization Module User’s Guide : Character and String Processing
Character and String Processing
Overview
As described in The Unicode Standard, the Internationalization Module uses the UTF-16 -character encoding form for the internal representation and manipulation of multilingual text. In UTF-16, each 21-bit Unicode code point is represented using one or two 16-bit code units.
The character and string processing classes of the Internationalization Module provide the ability to create and manipulate UTF-16 strings. This chapter describes how to:
*represent individual UTF-16 code units with RWUChar16 and Unicode code points with RWUChar32
*examine the character traits of an individual code point with RWUCharTraits; for example, its case, its direction of display, or whether it is a whitespace character
*represent and manipulate UTF-16 strings with RWUString, and substrings with RWUSubString and RWUConstSubString
*iterate over the code points in a string with RWUStringIterator and RWUConstStringIterator