Character and String Processing

SourcePro Core : Internationalization Module User’s Guide : Character and String Processing

Overview

As described in The Unicode Standard, the Internationalization Module uses the UTF-16 -character encoding form for the internal representation and manipulation of multilingual text. In UTF-16, each 21-bit Unicode code point is represented using one or two 16-bit code units.

The character and string processing classes of the Internationalization Module provide the ability to create and manipulate UTF-16 strings. This chapter describes how to:

represent individual UTF-16 code units with RWUChar16 and Unicode code points with RWUChar32

examine the character traits of an individual code point with RWUCharTraits; for example, its case, its direction of display, or whether it is a whitespace character

represent and manipulate UTF-16 strings with RWUString, and substrings with RWUSubString and RWUConstSubString

iterate over the code points in a string with RWUStringIterator and RWUConstStringIterator