SourcePro® API Reference Guide

 
List of all members | Public Types | Static Public Member Functions
RWUTF8Helper Class Reference

Provides common functionality used to encode and decode UTF-8 sequences. More...

#include <rw/stream/RWUTF8Helper.h>

Public Types

enum  EncodingCategory {
  oneByte, twoBytes, threeBytes, fourBytes,
  highSurrogate, missingLowSurrogate, lowSurrogateWithoutHighSurrogate, invalidUTF8Encoding
}
 

Static Public Member Functions

static EncodingCategory decodeFirstByte (RWByte b)
 
static EncodingCategory decodeFourBytesEncoding (RWByte firstByte, RWByte secondByte, RWByte thirdByte, RWByte fourthByte, RWUChar &highSurrogateValue, RWUChar &lowSurrogateValue)
 
static EncodingCategory decodeThreeBytesEncoding (RWByte firstByte, RWByte secondByte, RWByte thirdByte, RWUChar &res)
 
static EncodingCategory decodeTwoBytesEncoding (RWByte firstByte, RWByte secondByte, RWUChar &res)
 
static EncodingCategory encodeOneUChar (RWUChar uc, RWByte *res, RWUChar highSurrogateValue=0)
 

Detailed Description

The class RWUTF8Helper provides common functionality used to encode and decode UTF-8 sequences.

Member Enumeration Documentation

 

Enumerator
oneByte 

One byte encoding form of UTF-8

twoBytes 

Two bytes encoding form of UTF-8

threeBytes 

Three bytes encoding form of UTF-8

fourBytes 

Four bytes encoding from of UTF-8

highSurrogate 

The character to be encoded is a high surrogate

missingLowSurrogate 

No low surrogate after a high surrogate

lowSurrogateWithoutHighSurrogate 

A low surrogate was not preceded by a high surrogate

invalidUTF8Encoding 

The encoding is not recognized as UTF-8

Member Function Documentation

static EncodingCategory RWUTF8Helper::decodeFirstByte ( RWByte  b)
static

Takes the first byte of a UTF-8 byte sequence encoding a single UTF-16 character, and returns the encoding category to which it belongs. Throws no exceptions.

Parameters
bThe first byte of a UTF-8 byte sequence encoding a single UTF-16 character
static EncodingCategory RWUTF8Helper::decodeFourBytesEncoding ( RWByte  firstByte,
RWByte  secondByte,
RWByte  thirdByte,
RWByte  fourthByte,
RWUChar highSurrogateValue,
RWUChar lowSurrogateValue 
)
static

Decodes a four-byte UTF-8 sequence. The function returns invalidUTF8Encoding in case the four-byte sequence doesn't represent a valid UTF-8 encoding sequence. Throws no exceptions.

Parameters
firstByteThe first byte of a UTF-8 four-byte sequence encoding a single UTF-16 character.
secondByteThe second byte of a UTF-8 four-byte sequence encoding a single UTF-16 character.
thirdByteThe third byte of a UTF-8 four-byte sequence encoding a single UTF-16 character.
fourthByteThe fourth byte of a UTF-8 four-byte sequence encoding a single UTF-16 character.
highSurrogateValueThe UTF-16 high surrogate resulting from the decoding of the four-byte UTF-8 sequence.
lowSurrogateValueThe UTF-16 low surrogate resulting from the decoding of the four-byte UTF-8 sequence.
static EncodingCategory RWUTF8Helper::decodeThreeBytesEncoding ( RWByte  firstByte,
RWByte  secondByte,
RWByte  thirdByte,
RWUChar res 
)
static

Decodes a three-byte encoding UTF-8 sequence. The function returns invalidUTF8Encoding if the three-byte sequence doesn't represent a valid UTF-8 encoding sequence. Throws no exceptions.

Parameters
firstByteThe first byte of a UTF-8 three-byte sequence encoding a single UTF-16 character.
secondByteThe second byte of a UTF-8 three-byte sequence encoding a single UTF-16 character.
thirdByteThe third byte of a UTF-8 three-byte sequence encoding a single UTF-16 character.
resThe UTF-16 character resulting from the decoding of the three-byte UTF-8 sequence
static EncodingCategory RWUTF8Helper::decodeTwoBytesEncoding ( RWByte  firstByte,
RWByte  secondByte,
RWUChar res 
)
static

Decodes a two-byte encoding UTF-8 sequence. The function returns invalidUTF8Encoding in case the two-byte sequence doesn't represent a valid UTF-8 encoding sequence. Throws no exceptions.

Parameters
firstByteThe first byte of a UTF-8 two-byte sequence encoding a single UTF-16 character.
secondByteThe second byte of a UTF-8 two-byte sequence encoding a single UTF-16 character.
resThe UTF-16 character resulting from the decoding of the two-byte UTF-8 sequence
static EncodingCategory RWUTF8Helper::encodeOneUChar ( RWUChar  uc,
RWByte res,
RWUChar  highSurrogateValue = 0 
)
static

Encodes the UTF-16 character uc according to UTF-8. The function returns the UTF-8 encoding category that was used to convert the UTF-16 character, or an error if the UTF-16 character could not be transformed. Throws no exceptions.

Parameters
ucThe UTF-16 character to be transformed.
resA pointer to a byte array containing at least four bytes. The byte array is used to store the transformation result.
highSurrogateValueThis parameter is only used when a high surrogate was previously encountered.

Copyright © 2023 Rogue Wave Software, Inc., a Perforce company. All Rights Reserved.