RWBasicUString and RWCString

SourcePro Core : Internationalization Module User’s Guide : Character and String Processing : Representing Strings : RWBasicUString and RWCString

RWBasicUString is similar to RWCString. For example:

Both classes have methods append(), prepend(), insert(), remove(), and replace() for modifying a string.

Both classes also have methods first(), last(), index(), rindex(), and contains() that search for characters or strings of characters contained with a string.

Both classes have methods compareTo() for lexically ordering strings.

RWBasicUString differs from RWCString in that an RWBasicUString instance contains a series of Unicode characters encoded in UTF-16, while an RWCString instance contains bytes encoded in an arbitrary encoding. RWBasicUString also performs conversion between UTF-16 and UTF-8. Because RWBasicUString contains UTF-16, its API has some methods that RWCString does not. For example:

Methods requiresSurrogatePair(), isHighSurrogate(), and isLowSurrogate() indicate whether a 21-bit Unicode code point requires a surrogate pair of UTF-16 code units. Most characters can be represented in the UTF-16 encoding form with a single 16-bit code unit. Only characters in the range 0x10000 to 0x10FFFF must be represented with a surrogate pair of two UTF-16 code units.

Method computeCodePointValue() returns the appropriate RWUChar32 code point given a surrogate pair of RWUChar16 code units.

Methods highSurrogate() and lowSurrogate() return the first and second surrogate RWUChar16 code units for a given RWUChar32 code point.

Methods compareCodeUnits() and compareCodePoints() perform code unit and code point ordering of strings, respectively. Code unit ordering of two strings may differ from code point ordering if either string contains surrogate pairs.

Methods codeUnitLength() and codePointLength() return the number of code units or code points in a string. The standard length() method is equivalent to codeUnitLength().

Method toUtf8() returns an RWCString containing a UTF-8 representation of the string.

Method toUtf32() returns a std::basic_string templatized on RWUChar32 containing a UTF-32 representation of the string.

Method toWide() returns an RWWString containing a UTF-16 or UTF-32 representation of the contents of the string. The representation depends on the size of wchar_t. If sizeof(wchar_t) is 2, the RWWString is encoded in UTF-16. If sizeof(wchar_t) is 4, the RWWString is encoded in UTF-32.

Method validateCodePoint() throws an RWConversionErr if a given RWUChar32 code point is not a valid Unicode character, or returns the code point if it is valid. This method can be used to validate a code point value anywhere one is passed to a method.