National Character Sets and C++ Data Types
SourcePro DB uses four different C++ classes to hold character string data:
RWCString from the Essential Tools Module is used for standard ASCII strings.
RWWString from the Essential Tools Module is used for wide character strings, such as UCS-2 or UCS-4.
RWDBMBString from the DB Interface Module is used for multibyte characters strings, such as UTF-8.
RWBasicUString from the Essential Tools Module is used to encapsulate UTF-16 characters and strings.
Although class
RWCString is capable of storing multibyte character strings, you are encouraged to use
RWDBMBString for multibyte strings in SourcePro DB. Because some databases differentiate between multibyte and standard ASCII strings, applications using
RWDBMBString for multibyte character strings maximize portability to other databases.
For Unicode applications, however, you are encouraged to use class
RWBasicUString or
RWUString instead of
RWDBMBString or
RWWString. In SourcePro DB, all the different database vendor Unicode types are mapped to
RWBasicUString.
RWBasicUString is platform independent (always 2 bytes), and contains methods implemented specifically for manipulating UTF-16 data. Also, because
RWBasicUString is the base class of
RWUString, applications built with
RWBasicUString can integrate seamlessly with the SourcePro Internationalization Module.
NOTE: You are encouraged to use RWDBMBString rather than RWCString for storing multibyte character strings, and to use RWBasicUString rather than RWWString or RWDBMBString for handling UTF-16 data.
The actual character sets used by a given system depend on several aspects of the hardware and software installation. When an operating system is installed on a machine, a character set is selected to represent the keyboard attached to the machine as well as some possible supplementary character sets. A database also has at least one character set associated with the server and one with the client.
It is important to ensure compatibility between the default character set of the operating system and the character set of the client database software. The DB Interface Module does not implement translations between character sets, but it may forward a translation request to the underlying operating system for translations between wide and multibyte strings. If there is an incompatibility between the operating system’s multibyte character set and the multibyte character set expected by a database’s client software, there will be problems. UTF-16 data does not undergo any translations and is sent directly to the database client.
From the standpoint of SourcePro DB, the character set on the database server is irrelevant. It is the responsibility of the database software to translate between the server and client character sets. It is the responsibility of the system administrator to insure that this mapping of character sets is working properly.
NOTE: Incompatibility between the multibyte character set used by the operating system and the multibyte character set expected by database client software causes problems. It is the responsibility of your system administrator to ensure compatibility.
In the following sections, we discuss in more detail the four different C++ classes used by SourcePro DB to hold character string data.