Escape Sequences

SourcePro Core : Internationalization Module User’s Guide : Character and String Processing : Representing Strings : Escape Sequences

Escape Sequences

RWUString provides the unescape() method that replaces hexadecimal character escapes with their corresponding Unicode characters. The recognized escape sequences are shown in Table 1. The value of any other escape sequence is the value of the character that follows the backslash.

Table 1 – Recognized Escape Sequences
Escape Sequence	Unicode
\uhhhh	4 hexadecimal digits in the range [0-9A-Fa-f]
\Uhhhhhhhh	8 hexadecimal digits
\xhh	1 or 2 hexadecimal digits
\ooo	1, 2, or 3 octal digits in the range [0-7]
\a	U+0007: alert (BEL)
\b	U+0008: backspace (BS):
\t	U+0009: horizontal tab (HT)
\n	U+000A: newline/line feed (LF)
\v	U+000B: vertical tab (VT)
\f	U+000C: form feed (FF)
\r	U+000D: carriage return (CR)
\"	U+0022: double quote
\'	U+0027: single quote
\?	U+003F: question mark
\\	U+005C: backslash

Note that when you create an RWUString from a string literal containing an escaped character, you must use a double-backslash sequence to escape characters, as the C++ compiler itself treats the \ character as special, denoting the beginning of an escape sequence embedded in the C++ source code. For example:

RWUToUnicodeConverter fromAscii("US-ASCII");

RWUString str("clich\\u00e9", fromAscii);

RWUFromUnicodeConverter toAscii("US-ASCII");

std::cout << str.toBytes(toAscii) << std::endl;

std::cout << str.unescape().toBytes(toAscii) << std::endl;

Results:

========

clich\u00e9

cliché

If an escape sequence is ill-formed, unescape() throws an RWConversionErr. (See this class entry in the SourcePro C++ API Reference Guide.)