RWBasicUString and RWCString
RWBasicUString is similar to RWCString. For example:
Both classes have methods
append(),
prepend(),
insert(),
remove(), and
replace() for modifying a string.
Both classes also have methods
first(),
last(),
index(),
rindex(), and
contains() that search for characters or strings of characters contained with a string.
Both classes have methods
compareTo() for lexically ordering strings.
RWBasicUString differs from RWCString in that an RWBasicUString instance contains a series of Unicode characters encoded in UTF-16, while an RWCString instance contains bytes encoded in an arbitrary encoding. RWBasicUString also performs conversion between UTF-16 and UTF-8. Because RWBasicUString contains UTF-16, its API has some methods that RWCString does not. For example:
Methods
requiresSurrogatePair(),
isHighSurrogate(), and
isLowSurrogate() indicate whether a 21-bit Unicode code point requires a
surrogate pair of UTF-16 code units. Most characters can be represented in the UTF-16 encoding form with a single 16-bit code unit. Only characters in the range 0x10000 to 0x10FFFF must be represented with a surrogate pair of two UTF-16 code units.
Method
computeCodePointValue() returns the appropriate
RWUChar32 code point given a surrogate pair of
RWUChar16 code units.
Methods
highSurrogate() and
lowSurrogate() return the first and second surrogate
RWUChar16 code units for a given
RWUChar32 code point.
Methods
compareCodeUnits() and
compareCodePoints() perform code unit and code point ordering of strings, respectively. Code unit ordering of two strings may differ from code point ordering if either string contains surrogate pairs.
Methods
codeUnitLength() and
codePointLength() return the number of code units or code points in a string. The standard
length() method is equivalent to
codeUnitLength().
Method
toUtf8() returns an
RWCString containing a UTF-8 representation of the string.
Method
toUtf32() returns a
std::basic_string templatized on
RWUChar32 containing a UTF-32 representation of the string.
Method
toWide() returns an
RWWString containing a UTF-16 or UTF-32 representation of the contents of the string. The representation depends on the size of
wchar_t. If
sizeof(wchar_t) is
2, the
RWWString is encoded in UTF-16. If
sizeof(wchar_t) is
4, the
RWWString is encoded in UTF-32.
Method
validateCodePoint() throws an
RWConversionErr if a given
RWUChar32 code point is not a valid Unicode character, or returns the code point if it is valid. This method can be used to validate a code point value anywhere one is passed to a method.