Character Sets
Most relational databases were developed in environments where the primary language was English. In these environments, database servers stored character data in some variant of the CHAR or VARCHAR data types. As database vendors expanded beyond the English-speaking markets, demand increased for different native character sets. In response, the NCHAR and NVARCHAR data types were created for holding character data in national character sets.
In this chapter, we use the terms:
*standard character set data types, to mean the original CHAR or VARCHAR data types;
*national character set data types, to mean the newer NCHAR and NVARCHAR data types.
Unfortunately, database vendors did not standardize on a common set of features and capabilities for these new data types. Some databases implement national character set support in their standard character data types and use NCHAR and NVARCHAR as synonyms. Other vendors implement the data types identically except for the collation sequencing capabilities. Still others use completely separate implementations for standard and national character set data types. The documentation provided by your database vendor should help you identify the vendor’s implementation technique.
The DB Interface Module is designed to make the differences between database implementations nearly invisible, but some differences do persist. Please consult the internationalization section of your DB Access Module Guide to learn about the behavior differences.
In the examples in this chapter, we use Chinese characters that represent “Hello.” These characters were selected from the Unicode standard. In order for these examples to run properly, the machine must have the appropriate locales and environment installed. Please note that these examples are intended to show how to use the various string classes available to SourcePro DB, rather than how to write portable, correct Unicode applications.