Explicitly Converting to Unicode

Class RWUToUnicodeConverter converts text from any recognized encoding to UTF-16. An instance of this class can be used to convert byte sequences that represent characters in a specific character encoding into the code unit sequences that represent those characters in the UTF-16 character encoding form.

RWUString provides constructors that accept text and an RWUToUnicodeConverter instance to use to convert the text to UTF-16:

 

RWUToUnicodeConverter fromAscii("US-ASCII");

RWUString str = RWUString("hello", fromAscii);

Similarly, some RWURegularExpression constructors accept an RWUToUnicodeConverter instance used to convert the pattern data to UTF-16. (See Regular Expression String Searching for more information on regular expressions.)

RWUToUnicodeConverter also provides explicit convert() methods that accept a byte sequence in the associated encoding and a reference to an RWUString to hold the result of the conversion to UTF-16. For example, assuming source holds text encoded in ASCII, this code converts the byte sequence to UTF-16:

 

RWUToUnicodeConverter fromAscii("US-ASCII");

RWUString target;

fromAscii.convert(source, target);

The convert() method appends the results of a conversion to a target buffer. The convert() method also accepts a Boolean flush argument, with a default value of true. When flush is true, convert() flushes its internal buffers to the target buffer and clears its internal state. For modal encodings such as ISO-2022, clearing the internal state ensures that the next call to convert() can expect the source text to begin in the source encoding’s default, unshifted state.

Calling convert() once with a value of true for flush is useful when converting a piece of text in its entirety from a source encoding to UTF-16. In contrast, convert() may be used to fill a target buffer in a piecemeal fashion. Repeatedly calling convert() with a value of false for flush, then calling it once with a value of true, causes convert() to flush its buffers and clear its internal state only at the end of a multipart conversion process.