Converts text from UTF-16 to various byte-oriented standard character encoding schemes. More...

#include <rw/i18n/RWUFromUnicodeConverter.h>

Inheritance diagram for RWUFromUnicodeConverter:

Classes
class	ErrorResponseState
	Stores the current error response state of an RWUFromUnicodeConverter converter. More...

Public Types
enum	ErrorResponseType { Stop , Skip , Substitute , EscapeNativeHexadecimal , EscapeJavaHexadecimal , EscapeCHexadecimal , EscapeXmlDecimal , EscapeXmlHexadecimal }

Public Member Functions
	RWUFromUnicodeConverter (const char *encoding)

	RWUFromUnicodeConverter (const RWUConverterBase &original)

	RWUFromUnicodeConverter (const RWUFromUnicodeConverter &original)

	~RWUFromUnicodeConverter ()

void	convert (const RWUChar16 *source, RWCString &target, bool flush=true)

void	convert (const RWUChar16 *source, std::string &target, bool flush=true)

void	convert (const RWUChar16 source[], int32_t size, RWCString &target, bool flush=true)

void	convert (const RWUChar16 source[], int32_t size, std::string &target, bool flush=true)

void	convert (const RWUString &source, RWCString &target, bool flush=true)

void	convert (const RWUString &source, std::string &target, bool flush=true)

RWCString	getSubstitutionSequence () const

RWUFromUnicodeConverter &	operator= (const RWUConverterBase &rhs)

RWUFromUnicodeConverter &	operator= (const RWUFromUnicodeConverter &rhs)

void	reset ()

void	restoreErrorResponseState (const ErrorResponseState &state)

ErrorResponseState	saveErrorResponseState () const

void	setErrorResponse (ErrorResponseType response)

void	setSubstitutionSequence (const char substitutionSequence[], size_t length)

Public Member Functions inherited from RWUConverterBase
	~RWUConverterBase ()

RWCString	getCanonicalName () const

void	getLocalizedName (const RWULocale &locale, RWUString &result) const

size_t	getMaxBytesPerChar () const

size_t	getMinBytesPerChar () const

Additional Inherited Members
Static Public Member Functions inherited from RWUConverterBase
static RWCString	getCurrentLocaleEncodingName ()

static RWCString	getDefaultEncodingName ()

static void	setDefaultEncodingName (const char *encoding)

Protected Member Functions inherited from RWUConverterBase
	RWUConverterBase (const char *encoding)

	RWUConverterBase (const RWUConverterBase &original)

RWUConverterBase &	operator= (const RWUConverterBase &rhs)

Related Symbols inherited from RWUConverterBase
bool	operator!= (const RWUConverterBase &lhs, const RWUConverterBase &rhs)

Detailed Description

RWUFromUnicodeConverter provides a unidirectional text conversion facility for translating from UTF-16 to various byte-oriented standard character encoding schemes.

The convert() method appends the results of a conversion to a target buffer. If its flush argument is true, convert() flushes its internal buffers to the target buffer and clears its internal state. For modal encodings such as ISO-2022, clearing the internal state ensures that the next call to convert() produces target text that begins in the target encoding's default, unshifted state.

Calling convert() once with a value of true for flush is useful when converting a piece of text in its entirety from UTF-16 to a target encoding. In contrast, convert() may be used to fill a target buffer in a piecemeal fashion. Repeatedly calling convert() with a value of false for flush, then calling it once with a value of true, causes convert() to flush its buffers and clear its internal state only at the end of a multi-invocation conversion process.

At the conclusion of a successful call to convert() with flush set to true, the converter is reset automatically to a default, initial state, ready to start a new conversion process. Sometimes, however, it may be necessary to reset a converter explicitly using the reset() method:

if convert() has thrown an exception in response to an error, and you want to be sure the converter is in the default state before using it again.
if you are using the converter to fill a target buffer in a piecemeal fashion, and you wish to abandon that conversion process to begin another.
if you are copying a converter, and want to be sure the copy is in the default state.

Example: #include <rw/i18n/RWUFromUnicodeConverter.h>

#include <rw/i18n/RWUString.h>

#include <rw/i18n/RWUToUnicodeConverter.h>

#include <iostream>

using std::cout;

using std::endl;

int main() {

// Convert from ISO-8859-1 to UTF-16.

RWUToUnicodeConverter fromIso_8859_1("ISO-8859-1");

RWCString cstr("She sat in the café, sipping coffee.");

RWUString ustr;

fromIso_8859_1.convert(cstr, ustr);

// Convert from UTF-16 to US-ASCII. Note that `?' is

// substituted for `é', which cannot be represented

// in US-ASCII.

RWUFromUnicodeConverter toUsAscii("US-ASCII");

toUsAscii.setSubstitutionSequence("?", 1);

cout << ustr.toBytes(toUsAscii) << endl;

// Save the error response state

RWUFromUnicodeConverter::ErrorResponseState state =

toUsAscii.saveErrorResponseState();

// Convert from UTF-16 to US-ASCII again, replacing

// `é' with an escape sequence suitable for use in

// an XML or HTML file.

toUsAscii.setErrorResponse(

RWUFromUnicodeConverter::EscapeXmlHexadecimal);

cout << ustr.toBytes(toUsAscii) << endl;

// Restore the original error response state

toUsAscii.restoreErrorResponseState(state);

return 0;

} // main

RWCString
Offers powerful and convenient facilities for manipulating strings.
Definition stdcstring.h:826

RWUFromUnicodeConverter::ErrorResponseState
Stores the current error response state of an RWUFromUnicodeConverter converter.
Definition RWUFromUnicodeConverter.h:559

RWUFromUnicodeConverter
Converts text from UTF-16 to various byte-oriented standard character encoding schemes.
Definition RWUFromUnicodeConverter.h:117

RWUFromUnicodeConverter::EscapeXmlHexadecimal
@ EscapeXmlHexadecimal
Definition RWUFromUnicodeConverter.h:226

RWUString
Stores and manipulates Unicode character sequences encoded as UTF-16 code units.
Definition RWUString.h:187

RWUString::toBytes
RWCString toBytes(RWUFromUnicodeConverter &converter=RWUFromUnicodeConversionContext::getContext().getConverter()) const
Definition RWUString.h:2561

RWUToUnicodeConverter
Provides unidirectional text conversion from strings in various encodings to UTF-16-encoded RWUString...
Definition RWUToUnicodeConverter.h:152

Program output:

She sat in the caf?, sipping coffee.

She sat in the caf&xE9;, sipping coffee.

See also: RWUConverterBase, RWUFromUnicodeConversionContext, RWUToUnicodeConverter

Member Enumeration Documentation

◆ ErrorResponseType

enum RWUFromUnicodeConverter::ErrorResponseType

An ErrorResponseType value indicates what action an RWUFromUnicodeConverter should take when it encounters an error during the conversion process. Potential errors include code points with no mapping in the target encoding, and ill-formed code unit sequences, such as a low surrogate not followed by a high surrogate or a high surrogate without a preceding low surrogate. The default error response is RWUFromUnicodeConverter::Substitute.

See also: setErrorResponse()

Enumerator
Stop	Stops the conversion process, and throws an RWUException.
Skip	Silently skips over any illegal sequences, without writing to the target buffer.
Substitute	Substitutes illegal sequences with the current substitution sequence. The default substitution sequence depends on the target encoding. For US-ASCII-based encodings, the default substitution sequence is `0x1A`. See setSubstitutionSequence().
EscapeNativeHexadecimal	Replaces illegal sequences with a `%UX` escaped hexadecimal representation of the code units that comprise the illegal sequence–for example, `%UFFFE%U00AC`. Note a code point represented by a surrogate pair is escaped as two hexadecimal values. If the target encoding does not support the characters `{U,%}[A-F][0-9]`, an illegal sequence is replaced by the substitution sequence.
EscapeJavaHexadecimal	Replaces illegal sequences with a `\uX` escaped hexadecimal representation of the code units that comprise the illegal sequence–for example, `\uFFFE\u00AC`. Note a code point represented by a surrogate pair is escaped as two hexadecimal values—for example, `\uD84D\uDC56`. If the target encoding does not support the characters `{u,\}[A-F][0-9]`, an illegal sequence is replaced by the substitution sequence.
EscapeCHexadecimal	Replaces illegal sequences with a `\uX` escaped hexadecimal representation of the code units that comprise the illegal sequence–for example, `\uFFFE\u00AC`. Note a code point represented by a surrogate pair is escaped as a single hexadecimal value–for example, `\u00023456`. If the target encoding does not support the characters `{u,\}[A-F][0-9]`, an illegal sequence is replaced by the substitution sequence.
EscapeXmlDecimal	Replaces illegal sequences with a `&#DDDD`; escaped decimal representation of the code units that comprise the illegal sequence; for example, `&#172`;. Note a code point represented by a surrogate pair is escaped as a single decimal value without zero padding; for example, `&#144470`;. If the target encoding does not support the characters `{&,#,;}[0-9]`, an illegal sequence is replaced by the substitution sequence.
EscapeXmlHexadecimal	Replaces illegal sequences with a `&#XXXX`; escaped hexadecimal representation of the code units that comprise the illegal sequence; for example, `&#xFFFE`;¬. Note a code point represented by surrogate pair is escaped as a single hexadecimal value without zero padding; for example, `&#x12345`;. If the target encoding does not support the characters `{&,#,x,;}[0-9]`, an illegal sequence is replaced by the substitution sequence.

Constructor & Destructor Documentation

◆ RWUFromUnicodeConverter() [1/3]

RWUFromUnicodeConverter::RWUFromUnicodeConverter ( const char * encoding )

inlineexplicit

Constructs an RWUFromUnicodeConverter for the character encoding scheme given by encoding, the US-ASCII name or alias of a character encoding scheme (see RWUAvailableEncodingList and RWUEncodingAliasList).

Exceptions

RWUException Thrown to indicate that the converter could not be constructed. The exception carries one of the following status codes:

RWUMemoryAllocationError
Indicates that the memory required
by the converter could not be allocated.
RWUFileAccessError
Indicates that the requested converter
could not be found or opened.

◆ RWUFromUnicodeConverter() [2/3]

RWUFromUnicodeConverter::RWUFromUnicodeConverter ( const RWUFromUnicodeConverter & original )

inline

Constructs a converter that is a deep copy of another converter. The new converter uses the same character encoding scheme as the original converter, and possesses the same internal state as the original converter.

Exercise care when copying converters, especially those used for stateful or multibyte encodings. The new converter may be initialized in a state that causes the converter to produce errors if used to convert a new chunk of text. Consider using reset() to restore the converter to a known default state before use.

Exceptions

RWUException Thrown to indicate that the copy could not be completed because memory could not be allocated for the underlying implementation object.

◆ RWUFromUnicodeConverter() [3/3]

RWUFromUnicodeConverter::RWUFromUnicodeConverter ( const RWUConverterBase & original )

inlineexplicit

Constructs a converter that is a deep copy of another converter. The new converter uses the same character encoding scheme as the original converter, and possesses the same internal state as the original converter.

Exercise care when copying converters, especially those used for stateful or multibyte encodings. The new converter may be initialized in a state that causes the converter to produce errors if used to convert a new chunk of text. Consider using reset() to restore the converter to a known default state before use.

Exceptions

RWUException Thrown to indicate that the copy could not be completed because memory could not be allocated for the underlying implementation object.

◆ ~RWUFromUnicodeConverter()

RWUFromUnicodeConverter::~RWUFromUnicodeConverter ( )

inline

Destructor.

Member Function Documentation

◆ convert() [1/6]

void RWUFromUnicodeConverter::convert	(	const RWUChar16 *	source,
		RWCString &	target,
		bool	flush = true )

Converts the sequence of UTF-16 code units contained in the null-terminated source array into the sequence of bytes required to represent the source in the target character encoding scheme and appends that sequence of bytes to the target RWCString.

The boolean value flush specifies whether self should be flushed to ensure that any code units stored in the converter's internal state are written to target. The default (true) value explicitly forces a flush and resets the converter to the known default state. This value must be set to true when the current source buffer is the last available chunk of source.

You must also be sure that the source string encodes complete characters, if the output may be flushed, as any saved state and characters would be lost.

Exceptions

RWUException Thrown if an unhandled conversion error occurs. The target is not modified if an exception is thrown.

◆ convert() [2/6]

void RWUFromUnicodeConverter::convert	(	const RWUChar16 *	source,
		std::string &	target,
		bool	flush = true )

Converts the sequence of UTF-16 code units contained in the null-terminated source array into the sequence of bytes required to represent the source in the target character encoding scheme and appends that sequence of bytes to the target string.

The boolean value flush specifies whether self should be flushed to ensure that any code units stored in the converter's internal state are written to target. The default (true) value explicitly forces a flush and resets the converter to the known default state. This value must be set to true when the current source buffer is the last available chunk of source.

You must also be sure that the source string encodes complete characters, if the output may be flushed, as any saved state and characters would be lost.

Exceptions

RWUException Thrown if an unhandled conversion error occurs. The target is not modified if an exception is thrown.

◆ convert() [3/6]

void RWUFromUnicodeConverter::convert	(	const RWUChar16	source[],
		int32_t	size,
		RWCString &	target,
		bool	flush = true )

Converts the sequence of UTF-16 code units contained in the sized source array into the sequence of bytes required to represent the source in the target character encoding scheme and appends that sequence of bytes to the target RWCString. size specifies number of the code units contained in the array. The array may contain embedded nulls.

The boolean value flush specifies whether self should be flushed to ensure that any code units stored in the converter's internal state are written to target. The default (true) value explicitly forces a flush and resets the converter to the known default state. This value must be set to true when the current source buffer is the last available chunk of source.

You must also be sure that the source string encodes complete characters, if the output may be flushed, as any saved state and characters would be lost.

Exceptions

RWUException Thrown if an unhandled conversion error occurs. The target is not modified if an exception is thrown.

◆ convert() [4/6]

void RWUFromUnicodeConverter::convert	(	const RWUChar16	source[],
		int32_t	size,
		std::string &	target,
		bool	flush = true )

Converts the sequence of UTF-16 code units contained in the sized source array into the sequence of bytes required to represent the source in the target character encoding scheme and appends that sequence of bytes to the target string. size specifies number of the code units contained in the array. The array may contain embedded nulls.

The boolean value flush specifies whether self should be flushed to ensure that any code units stored in the converter's internal state are written to target. The default (true) value explicitly forces a flush and resets the converter to the known default state. This value must be set to true when the current source buffer is the last available chunk of source.

You must also be sure that the source string encodes complete characters, if the output may be flushed, as any saved state and characters would be lost.

Exceptions

RWUException Thrown if an unhandled conversion error occurs. The target is not modified if an exception is thrown.

◆ convert() [5/6]

void RWUFromUnicodeConverter::convert	(	const RWUString &	source,
		RWCString &	target,
		bool	flush = true )

Converts the sequence of UTF-16 code units contained in the given RWUString into the sequence of bytes required to represent the source in the target character encoding scheme and appends that sequence of bytes to the target RWCString.

The boolean value flush specifies whether self should be flushed to ensure that any code units stored in the converter's internal state are written to target. The default (true) value explicitly forces a flush and resets the converter to the known default state. This value must be set to true when the current source buffer is the last available chunk of source.

You must also be sure that the source string encodes complete characters, if the output may be flushed, as any saved state and characters would be lost.

Exceptions

RWUException Thrown if an unhandled conversion error occurs. The target is not modified if an exception is thrown.

◆ convert() [6/6]

void RWUFromUnicodeConverter::convert	(	const RWUString &	source,
		std::string &	target,
		bool	flush = true )

Converts the sequence of UTF-16 code units contained in the given RWUString into the sequence of bytes required to represent the source in the target character encoding scheme and appends that sequence of bytes to the target string.

The boolean value flush specifies whether self should be flushed to ensure that any code units stored in the converter's internal state are written to target. The default (true) value explicitly forces a flush and resets the converter to the known default state. This value must be set to true when the current source buffer is the last available chunk of source.

You must also be sure that the source string encodes complete characters, if the output may be flushed, as any saved state and characters would be lost.

Exceptions

RWUException Thrown if an unhandled conversion error occurs. The target is not modified if an exception is thrown.

◆ getSubstitutionSequence()

RWCString RWUFromUnicodeConverter::getSubstitutionSequence ( ) const

Returns the current sequence of bytes that self inserts into the conversion target when a source character or sequence is encountered that cannot be represented in the target encoding.

◆ operator=() [1/2]

RWUFromUnicodeConverter & RWUFromUnicodeConverter::operator= ( const RWUConverterBase & rhs )

inline

Assignment operator. Makes self a deep copy of rhs. Self uses the same character encoding scheme as rhs, and possesses the same internal state as rhs.

Exercise care when copying converters, especially those used for stateful or multibyte encodings. The new converter may be initialized in a state that causes the converter to produce errors if used to convert a new chunk of text. Consider using reset() to restore the converter to a known state before use.

Exceptions

RWUException Thrown to indicate that the copy could not be completed because memory could not be allocated for the underlying implementation object.

◆ operator=() [2/2]

RWUFromUnicodeConverter & RWUFromUnicodeConverter::operator= ( const RWUFromUnicodeConverter & rhs )

inline

Assignment operator. Makes self a deep copy of rhs. Self uses the same character encoding scheme as rhs, and possesses the same internal state as rhs.

Exercise care when copying converters, especially those used for stateful or multibyte encodings. The new converter may be initialized in a state that causes the converter to produce errors if used to convert a new chunk of text. Consider using reset() to restore the converter to a known state before use.

Exceptions

RWUException Thrown to indicate that the copy could not be completed because memory could not be allocated for the underlying implementation object.

◆ reset()

void RWUFromUnicodeConverter::reset ( )

inline

Resets self by clearing the internal buffers and restoring the state to a known default state.

◆ restoreErrorResponseState()

void RWUFromUnicodeConverter::restoreErrorResponseState ( const ErrorResponseState & state )

inline

Restores the error handling state of the converter from a saved copy. This is the only means of restoring an error response state that existed prior to a call to setErrorResponse(). Use saveErrorResponseState() to save the error response state.

Note: The saved state from one converter may be used to set the state on another converter. However, this operation may not be safe in future versions of the Internationalization Module.

◆ saveErrorResponseState()

RWUFromUnicodeConverter::ErrorResponseState RWUFromUnicodeConverter::saveErrorResponseState ( ) const

inline

Saves the current error handling state of the converter. This is the only means for saving the current error response state prior to calling setErrorResponse(). Use restoreErrorResponseState() to restore the saved state.

RWUFromUnicodeConverter converter;
RWUFromUnicodeConverter::ErrorResponseState state =
    converter.saveErrorResponse();
converter.setErrorResponseState(RWUFromUnicodeConverter::Stop);
converter.restoreErrorResponseState(state);

Note: The saved state from one converter may be used to set the state on another converter. However, this operation may not be safe in future versions of the Internationalization Module.

◆ setErrorResponse()

void RWUFromUnicodeConverter::setErrorResponse ( ErrorResponseType response )

Specifies the action self should take when it encounters an error during the conversion process.

◆ setSubstitutionSequence()

void RWUFromUnicodeConverter::setSubstitutionSequence	(	const char	substitutionSequence[],
		size_t	length )

Specifies the sequence of bytes that self should insert into the conversion target when a source character or sequence is encountered that cannot be represented in the target encoding.

Many encodings have predefined substitution sequences. For example, the single character value 0x1A is commonly used for most US-ASCII-based encodings.

A valid substitutionSequence must be an array containing a sequence of 1 to 4 bytes. The length parameter specifies the length of the substitution sequence in bytes. The number of bytes must fall on or between the values returned by RWUConverterBase::getMinBytesPerChar()and RWUConverterBase::getMaxBytesPerChar().

Exceptions

RWUException Thrown with a value RWUIndexOutOfBoundsError to indicate that the length of the substitution sequence is incompatible with the character size of the target encoding.

SourcePro® API Reference Guide

Classes

Public Types

Public Member Functions

Additional Inherited Members

Detailed Description

Member Enumeration Documentation

◆ ErrorResponseType

Constructor & Destructor Documentation

◆ RWUFromUnicodeConverter() [1/3]

◆ RWUFromUnicodeConverter() [2/3]

◆ RWUFromUnicodeConverter() [3/3]

◆ ~RWUFromUnicodeConverter()

Member Function Documentation

◆ convert() [1/6]

◆ convert() [2/6]

◆ convert() [3/6]

◆ convert() [4/6]

◆ convert() [5/6]

◆ convert() [6/6]

◆ getSubstitutionSequence()

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ reset()

◆ restoreErrorResponseState()

◆ saveErrorResponseState()

◆ setErrorResponse()

◆ setSubstitutionSequence()