Converting to and from UTF-8 and UTF-16

You may convert your RWCStrings, RWBasicUStrings, and RWWStrings to and from UTF-8 and UTF-16 as needed.

Converting RWCStrings and RWBasicUStrings

RWBasicUString provides conversion to and from UTF-16 and an RWCString containing UTF-8.

From UTF-8 to UTF-16

To convert from UTF-8 to UTF-16, simply construct an RWBasicUString with your RWCString as a parameter.

 

RWCString myUtf8String = “hello world”;

RWBasicUString myUtf16String(myUtf8String);

From UTF-16 to UTF-8

To convert from UTF-16 to UTF-8, use the RWBasicUString.toUTF8() function.

 

RWCString myUtf8String(myUtf16String.toUtf8());

See the SourcePro API Reference for more information.

Converting RWWStrings to and from UTF-8

RWBasicUString does not provide a complete RWWString or wchar_t interface. (Its method towide() converts from UTF-16 to an RWWString, but it does not take a wchar_t and return a Unicode UTF-16 string.)

To perform these wide character conversions, you may use the UTF-8 converter classes provided in the Advanced Tools Module’s Stream package.

From UTF-8 to a UTF-16 Wide Character

To convert from a UTF-8 encoded RWCString to a UTF-16 encoded RWWString, use RWFromUTF8Converter as shown in this example.

 

RWWString destination; // 1

RWCString source(“My UTF8 characters”);

 

RWFromUTF8Converter converter; // 2

converter.convert(source,destination); // 3

// 1  Start with an empty RWWString and an RWCString containing UTF-8 encoded characters.

// 2   Create the RWFromUTF8Converter.

// 3   Call the convert() function on RWFromUTF8Converter with the source and destination strings as parameters. After this call, the destination local variable will contain the source string encoded in a UTF-16 RWWString.

From a UTF-16 Wide Character to UTF-8

To convert from a UTF-16 encoded RWWString to a UTF-8 encoded RWCString, use RWToUTF8Converter as shown in this example.

 

RWWString newSource(”My UTF16 characters”);

RWCString newDestination; // 1

 

RWToUTF8Converter converter; // 2

converter.convert(newSource,newDestination); // 3

// 1  Create a new empty RWCString and a new RWWString source.

// 2   Create the RWToUTF8Converter.

// 3   Call the convert() function on RWFromUTF8Converter with the source and destination strings as parameters. After this call, the newDestination local variable will contain the source string encoded in UTF-8.

Note that these wide character conversions are suitable only on systems where wchar_t’s native encoding is UTF-16.