Multilingual Text in C++
The C++ Standard defines two character sets, the basic source character set and the execution character set (sometimes called the machine character set):
the basic source character set is the set of characters used to compose a C++ source program
the execution character set is the character set used when a C++ application is executing
For the typical desktop computing environment, the two character sets are the same. But for a localized application, they may differ.
The basic source character set consists of the space character and control characters representing the horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
_ { } [ ] # ( ) < > % : ; . ?
* + - / ^ & | ~ ! = , \ " ‘
All compilers must support this minimum set, but may extend the set. Compilers may use any character encoding form to represent the basic source character set.
The execution character set includes the basic source character set, plus control characters for alert, backspace, carriage return, and null, plus additional implementation-defined characters.
The C++ Standard does not dictate any particular execution character encoding forms. In particular, the encoding forms used for the execution character sets need not match the encoding form used for the basic source character set.
The choice of execution character encoding form may be governed by:
the implementation
a compiler command-line option
a
#pragma instruction
an environment variable (
LANG,
LC_CTYPE)
the platform’s current locale
To provide a way to represent characters outside the basic source character set, the C++ Standard defines the universal character name construct. An escape sequence of the form \uXXXX or \uXXXXXXXX, where XXXX or XXXXXXXX is a hexadecimal value, specifies a code point in the ISO/IEC 10646 and Unicode coded character sets.
Your compiler maps each basic source character, universal character name, and any escape characters that appear in a character or string literal into an equivalent execution character set representation. The literals created by this transformation are incorporated into the object code of the C++ translation unit.