Explicitly Converting to Unicode
Class
RWUToUnicodeConverter converts text from any recognized encoding to
UTF-16. An instance of this class can be used to convert byte sequences that represent characters in a specific
character encoding into the
code unit sequences that represent those characters in the UTF-16
character encoding form.
RWUString provides constructors that accept text and an RWUToUnicodeConverter instance to use to convert the text to UTF-16:
RWUToUnicodeConverter fromAscii("US-ASCII");
RWUString str = RWUString("hello", fromAscii);
Similarly, some
RWURegularExpression constructors accept an
RWUToUnicodeConverter instance used to convert the pattern data to UTF-16. (See
Regular Expression String Searching for more information on regular expressions.)
RWUToUnicodeConverter also provides explicit convert() methods that accept a byte sequence in the associated encoding and a reference to an RWUString to hold the result of the conversion to UTF-16. For example, assuming source holds text encoded in ASCII, this code converts the byte sequence to UTF-16:
RWUToUnicodeConverter fromAscii("US-ASCII");
RWUString target;
fromAscii.convert(source, target);
The convert() method appends the results of a conversion to a target buffer. The convert() method also accepts a Boolean flush argument, with a default value of true. When flush is true, convert() flushes its internal buffers to the target buffer and clears its internal state. For modal encodings such as ISO-2022, clearing the internal state ensures that the next call to convert() can expect the source text to begin in the source encoding’s default, unshifted state.
Calling convert() once with a value of true for flush is useful when converting a piece of text in its entirety from a source encoding to UTF-16. In contrast, convert() may be used to fill a target buffer in a piecemeal fashion. Repeatedly calling convert() with a value of false for flush, then calling it once with a value of true, causes convert() to flush its buffers and clear its internal state only at the end of a multipart conversion process.