Provides common functionality used to encode and decode UTF-8 sequences.
More...
#include <rw/stream/RWUTF8Helper.h>
|
static EncodingCategory | decodeFirstByte (RWByte b) |
|
static EncodingCategory | decodeFourBytesEncoding (RWByte firstByte, RWByte secondByte, RWByte thirdByte, RWByte fourthByte, RWUChar &highSurrogateValue, RWUChar &lowSurrogateValue) |
|
static EncodingCategory | decodeThreeBytesEncoding (RWByte firstByte, RWByte secondByte, RWByte thirdByte, RWUChar &res) |
|
static EncodingCategory | decodeTwoBytesEncoding (RWByte firstByte, RWByte secondByte, RWUChar &res) |
|
static EncodingCategory | encodeOneUChar (RWUChar uc, RWByte *res, RWUChar highSurrogateValue=0) |
|
The class RWUTF8Helper provides common functionality used to encode and decode UTF-8 sequences.
◆ EncodingCategory
Enumerator |
---|
oneByte | One byte encoding form of UTF-8
|
twoBytes | Two bytes encoding form of UTF-8
|
threeBytes | Three bytes encoding form of UTF-8
|
fourBytes | Four bytes encoding from of UTF-8
|
highSurrogate | The character to be encoded is a high surrogate
|
missingLowSurrogate | No low surrogate after a high surrogate
|
lowSurrogateWithoutHighSurrogate | A low surrogate was not preceded by a high surrogate
|
invalidUTF8Encoding | The encoding is not recognized as UTF-8
|
◆ decodeFirstByte()
Takes the first byte of a UTF-8 byte sequence encoding a single UTF-16 character, and returns the encoding category to which it belongs. Throws no exceptions.
- Parameters
-
b | The first byte of a UTF-8 byte sequence encoding a single UTF-16 character |
◆ decodeFourBytesEncoding()
Decodes a four-byte UTF-8 sequence. The function returns invalidUTF8Encoding in case the four-byte sequence doesn't represent a valid UTF-8 encoding sequence. Throws no exceptions.
- Parameters
-
firstByte | The first byte of a UTF-8 four-byte sequence encoding a single UTF-16 character. |
secondByte | The second byte of a UTF-8 four-byte sequence encoding a single UTF-16 character. |
thirdByte | The third byte of a UTF-8 four-byte sequence encoding a single UTF-16 character. |
fourthByte | The fourth byte of a UTF-8 four-byte sequence encoding a single UTF-16 character. |
highSurrogateValue | The UTF-16 high surrogate resulting from the decoding of the four-byte UTF-8 sequence. |
lowSurrogateValue | The UTF-16 low surrogate resulting from the decoding of the four-byte UTF-8 sequence. |
◆ decodeThreeBytesEncoding()
Decodes a three-byte encoding UTF-8 sequence. The function returns invalidUTF8Encoding if the three-byte sequence doesn't represent a valid UTF-8 encoding sequence. Throws no exceptions.
- Parameters
-
firstByte | The first byte of a UTF-8 three-byte sequence encoding a single UTF-16 character. |
secondByte | The second byte of a UTF-8 three-byte sequence encoding a single UTF-16 character. |
thirdByte | The third byte of a UTF-8 three-byte sequence encoding a single UTF-16 character. |
res | The UTF-16 character resulting from the decoding of the three-byte UTF-8 sequence |
◆ decodeTwoBytesEncoding()
Decodes a two-byte encoding UTF-8 sequence. The function returns invalidUTF8Encoding in case the two-byte sequence doesn't represent a valid UTF-8 encoding sequence. Throws no exceptions.
- Parameters
-
firstByte | The first byte of a UTF-8 two-byte sequence encoding a single UTF-16 character. |
secondByte | The second byte of a UTF-8 two-byte sequence encoding a single UTF-16 character. |
res | The UTF-16 character resulting from the decoding of the two-byte UTF-8 sequence |
◆ encodeOneUChar()
Encodes the UTF-16 character uc according to UTF-8. The function returns the UTF-8 encoding category that was used to convert the UTF-16 character, or an error if the UTF-16 character could not be transformed. Throws no exceptions.
- Parameters
-
uc | The UTF-16 character to be transformed. |
res | A pointer to a byte array containing at least four bytes. The byte array is used to store the transformation result. |
highSurrogateValue | This parameter is only used when a high surrogate was previously encountered. |