Provides common functionality used to encode and decode UTF-8 sequences. More...

#include <rw/stream/RWUTF8Helper.h>

Public Types
enum	EncodingCategory { oneByte , twoBytes , threeBytes , fourBytes , highSurrogate , missingLowSurrogate , lowSurrogateWithoutHighSurrogate , invalidUTF8Encoding }

Static Public Member Functions
static EncodingCategory	decodeFirstByte (RWByte b)

static EncodingCategory	decodeFourBytesEncoding (RWByte firstByte, RWByte secondByte, RWByte thirdByte, RWByte fourthByte, RWUChar &highSurrogateValue, RWUChar &lowSurrogateValue)

static EncodingCategory	decodeThreeBytesEncoding (RWByte firstByte, RWByte secondByte, RWByte thirdByte, RWUChar &res)

static EncodingCategory	decodeTwoBytesEncoding (RWByte firstByte, RWByte secondByte, RWUChar &res)

static EncodingCategory	encodeOneUChar (RWUChar uc, RWByte *res, RWUChar highSurrogateValue=0)

Detailed Description

The class RWUTF8Helper provides common functionality used to encode and decode UTF-8 sequences.

Member Enumeration Documentation

◆ EncodingCategory

enum RWUTF8Helper::EncodingCategory

Enumerator
oneByte	One byte encoding form of UTF-8
twoBytes	Two bytes encoding form of UTF-8
threeBytes	Three bytes encoding form of UTF-8
fourBytes	Four bytes encoding from of UTF-8
highSurrogate	The character to be encoded is a high surrogate
missingLowSurrogate	No low surrogate after a high surrogate
lowSurrogateWithoutHighSurrogate	A low surrogate was not preceded by a high surrogate
invalidUTF8Encoding	The encoding is not recognized as UTF-8

Member Function Documentation

◆ decodeFirstByte()

static EncodingCategory RWUTF8Helper::decodeFirstByte ( RWByte b )

static

Takes the first byte of a UTF-8 byte sequence encoding a single UTF-16 character, and returns the encoding category to which it belongs. Throws no exceptions.

Parameters

b	The first byte of a UTF-8 byte sequence encoding a single UTF-16 character

◆ decodeFourBytesEncoding()

static EncodingCategory RWUTF8Helper::decodeFourBytesEncoding	(	RWByte	firstByte,
		RWByte	secondByte,
		RWByte	thirdByte,
		RWByte	fourthByte,
		RWUChar &	highSurrogateValue,
		RWUChar &	lowSurrogateValue )

static

Decodes a four-byte UTF-8 sequence. The function returns invalidUTF8Encoding in case the four-byte sequence doesn't represent a valid UTF-8 encoding sequence. Throws no exceptions.

Parameters

firstByte	The first byte of a UTF-8 four-byte sequence encoding a single UTF-16 character.
secondByte	The second byte of a UTF-8 four-byte sequence encoding a single UTF-16 character.
thirdByte	The third byte of a UTF-8 four-byte sequence encoding a single UTF-16 character.
fourthByte	The fourth byte of a UTF-8 four-byte sequence encoding a single UTF-16 character.
highSurrogateValue	The UTF-16 high surrogate resulting from the decoding of the four-byte UTF-8 sequence.
lowSurrogateValue	The UTF-16 low surrogate resulting from the decoding of the four-byte UTF-8 sequence.

◆ decodeThreeBytesEncoding()

static EncodingCategory RWUTF8Helper::decodeThreeBytesEncoding	(	RWByte	firstByte,
		RWByte	secondByte,
		RWByte	thirdByte,
		RWUChar &	res )

static

Decodes a three-byte encoding UTF-8 sequence. The function returns invalidUTF8Encoding if the three-byte sequence doesn't represent a valid UTF-8 encoding sequence. Throws no exceptions.

Parameters

firstByte	The first byte of a UTF-8 three-byte sequence encoding a single UTF-16 character.
secondByte	The second byte of a UTF-8 three-byte sequence encoding a single UTF-16 character.
thirdByte	The third byte of a UTF-8 three-byte sequence encoding a single UTF-16 character.
res	The UTF-16 character resulting from the decoding of the three-byte UTF-8 sequence

◆ decodeTwoBytesEncoding()

static EncodingCategory RWUTF8Helper::decodeTwoBytesEncoding	(	RWByte	firstByte,
		RWByte	secondByte,
		RWUChar &	res )

static

Decodes a two-byte encoding UTF-8 sequence. The function returns invalidUTF8Encoding in case the two-byte sequence doesn't represent a valid UTF-8 encoding sequence. Throws no exceptions.

Parameters

firstByte	The first byte of a UTF-8 two-byte sequence encoding a single UTF-16 character.
secondByte	The second byte of a UTF-8 two-byte sequence encoding a single UTF-16 character.
res	The UTF-16 character resulting from the decoding of the two-byte UTF-8 sequence

◆ encodeOneUChar()

static EncodingCategory RWUTF8Helper::encodeOneUChar	(	RWUChar	uc,
		RWByte *	res,
		RWUChar	highSurrogateValue = 0 )

static

Encodes the UTF-16 character uc according to UTF-8. The function returns the UTF-8 encoding category that was used to convert the UTF-16 character, or an error if the UTF-16 character could not be transformed. Throws no exceptions.

Parameters

uc	The UTF-16 character to be transformed.
res	A pointer to a byte array containing at least four bytes. The byte array is used to store the transformation result.
highSurrogateValue	This parameter is only used when a high surrogate was previously encountered.

SourcePro® API Reference Guide

Public Types

Static Public Member Functions

Detailed Description

Member Enumeration Documentation

◆ EncodingCategory

Member Function Documentation

◆ decodeFirstByte()

◆ decodeFourBytesEncoding()

◆ decodeThreeBytesEncoding()

◆ decodeTwoBytesEncoding()

◆ encodeOneUChar()