Unicode Coded Character Set

SourcePro Core : Internationalization Module User’s Guide : Concepts : The Unicode Standard : Unicode Coded Character Set

Unicode is a coded character set. It assigns numeric values from 0 to 0x10FFFF to abstract characters.

The Unicode Standard provides the capacity to encode nearly every character used in all of the writing systems of the world. It provides a unique integer to represent every character, no matter what the platform, no matter what the program, no matter what the language.

No escape sequences or control codes are required to specify any characters. The Unicode character encoding treats alphabetic characters, ideographic characters, and symbols equivalently. Characters from different scripts may be mixed and processed together as required.

In text, Unicode code points are usually expressed as U+n, where n is from four to six hexadecimal digits, using the digits 0-9 and A-F (for 10-15). Leading zeros are not used, unless the code point would have fewer than four hexadecimal digits. For example, U+00E9 represents the Unicode code point for é. This is the convention following in the documentation for the Internationalization Module, including this manual.