National Character Sets and C++ Data Types

SourcePro : DB Interface Module User’s Guide : PART III Using Advanced Features : Internationalization : Character Sets : National Character Sets and C++ Data Types

SourcePro DB uses four different C++ classes to hold character string data:

RWCString from the Essential Tools Module is used for standard ASCII strings.

RWWString from the Essential Tools Module is used for wide character strings, such as UCS-2 or UCS-4.

RWDBMBString from the DB Interface Module is used for multibyte characters strings, such as UTF-8.

RWBasicUString from the Essential Tools Module is used to encapsulate UTF-16 characters and strings.

Although class RWCString is capable of storing multibyte character strings, you are encouraged to use RWDBMBString for multibyte strings in SourcePro DB. Because some databases differentiate between multibyte and standard ASCII strings, applications using RWDBMBString for multibyte character strings maximize portability to other databases.

For Unicode applications, however, you are encouraged to use class RWBasicUString or RWUString instead of RWDBMBString or RWWString. In SourcePro DB, all the different database vendor Unicode types are mapped to RWBasicUString. RWBasicUString is platform independent (always 2 bytes), and contains methods implemented specifically for manipulating UTF-16 data. Also, because RWBasicUString is the base class of RWUString, applications built with RWBasicUString can integrate seamlessly with the SourcePro Internationalization Module.

NOTE: 	You are encouraged to use RWDBMBString rather than RWCString for storing multibyte character strings, and to use RWBasicUString rather than RWWString or RWDBMBString for handling UTF-16 data.

The actual character sets used by a given system depend on several aspects of the hardware and software installation. When an operating system is installed on a machine, a character set is selected to represent the keyboard attached to the machine as well as some possible supplementary character sets. A database also has at least one character set associated with the server and one with the client.

It is important to ensure compatibility between the default character set of the operating system and the character set of the client database software. The DB Interface Module does not implement translations between character sets, but it may forward a translation request to the underlying operating system for translations between wide and multibyte strings. If there is an incompatibility between the operating system’s multibyte character set and the multibyte character set expected by a database’s client software, there will be problems. UTF-16 data does not undergo any translations and is sent directly to the database client.

From the standpoint of SourcePro DB, the character set on the database server is irrelevant. It is the responsibility of the database software to translate between the server and client character sets. It is the responsibility of the system administrator to insure that this mapping of character sets is working properly.

NOTE: 	Incompatibility between the multibyte character set used by the operating system and the multibyte character set expected by database client software causes problems. It is the responsibility of your system administrator to ensure compatibility.

In the following sections, we discuss in more detail the four different C++ classes used by SourcePro DB to hold character string data.