Unicode Character Encoding Forms
Any character in the Unicode character set can be expressed using 21-bits. The Unicode Standard defines three character encoding forms for representing each 21-bit code point in memory:
UTF-8 Each 21-bit code point is represented using one to four 8-bit code units.
UTF-16 Each 21-bit code point is represented using one or two 16-bit code units.
UTF-32 Each 21-bit code point is represented using a single 32-bit code unit.
The UTF-16 encoding form strikes a balance between ease of use and efficient use of memory. Most characters can be represented with a single 16-bit code unit. Only characters in the range 0x10000 to 0x10FFFF must be represented with a surrogate pair of two UTF-16 code units.
The Internationalization Module uses UTF-16 for the internal representation and manipulation of multilingual text.