String Length
The characteristics of UTF-16 imply that the number of 16-bit code units in an
RWUString may differ from the number of code points. Furthermore, the nature of Unicode implies that the number of code points may differ from the number of characters, as interpreted by the end user. Several methods are provided to determine the length of a string:
The inherited
length() and
codeUnitlength() methods return the number of UTF-16 code units in an
RWUString.
The inherited
codePointLength() method returns the number of code points in an
RWUString.
Note that codePointLength() may be slower than length() or codeUnitLength() because codePointLength() must traverse the string to find code points that arise from surrogate code unit pairs. Since the majority of code points in the current Unicode Standard do not require a surrogate representation, many applications can rely on length() or codeUnitLength() to determine or estimate the number of code points.
An
RWUBreakSearch can also be used to iterate over the characters of an
RWUString, in the context of a particular locale. (See
Chapter 7.)