String Length

SourcePro Core : Internationalization Module User’s Guide : Character and String Processing : Representing Strings : String Length

String Length

The characteristics of UTF-16 imply that the number of 16-bit code units in an RWUString may differ from the number of code points. Furthermore, the nature of Unicode implies that the number of code points may differ from the number of characters, as interpreted by the end user. Several methods are provided to determine the length of a string:

The inherited length() and codeUnitlength() methods return the number of UTF-16 code units in an RWUString.

The inherited codePointLength() method returns the number of code points in an RWUString.

Note that codePointLength() may be slower than length() or codeUnitLength() because codePointLength() must traverse the string to find code points that arise from surrogate code unit pairs. Since the majority of code points in the current Unicode Standard do not require a surrogate representation, many applications can rely on length() or codeUnitLength() to determine or estimate the number of code points.

An RWUBreakSearch can also be used to iterate over the characters of an RWUString, in the context of a particular locale. (See Chapter 7.)