Character Encoding Forms
In order to represent characters in a computer, each code point in a coded character set must be mapped to a sequence of bits. This mapping is called a character encoding form.
A code unit is the fundamental binary width used in a computer architecture for representing character data, such as 7 bits, 8 bits, 16 bits, or 32 bits. Depending on the character encoding form used, each code point in a coded character set may be represented internally by one or more such code units.
A character encoding form whose code unit sequences are all of the same length is known as a fixed width encoding. For example, single-byte character sets (SBCS) are fixed width. If a double-byte character set (DBCS) always uses two code units to represent a code point, then it is also fixed width.
A character encoding form whose sequences are not all of the same length is known as a variable width encoding. If a double-byte character set uses one or two code units to represent a code point, then it is a variable width encoding. Multibyte character sets (MBCS) are variable width.
Examples of character encoding forms include:
*US ASCII, a 7-bit fixed width encoding form
*ISO 8859-1, an 8-bit fixed width encoding form
*CP 037 and CP 500, 8-bit fixed width EBCDIC encoding forms
*Windows CP 1252, an 8-bit fixed width encoding form
*Shift-JIS, a 16-bit variable width encoding form for JIS X 0208
*UTF-8, a variable width 8-bit encoding form for Unicode 3.0
*UTF-16, a variable width 16-bit encoding form for Unicode 3.0
*UTF-32, a fixed-width 32-bit encoding form for Unicode 3.0