Finding Collation Elements

SourcePro Core : Internationalization Module User’s Guide : Collation : Customizing a Collator : Finding Collation Elements

The enableNormalizationChecking() method lets you modify the process by which RWUCollator obtains a series of collation elements from a string of Unicode characters. It controls whether RWUCollator normalizes a string before finding its collation elements.

Without normalization, RWUCollator can correctly collate strings in Fast C or D form. These are strings whose raw, recursive decomposition, without re-ordering of diacritics, results in an NFD string (Normalization Form Decomposed; see Chapter 5 for more information in normalization forms). Most strings in many languages are already in FCD form. In contrast, strings in languages that use multiple combining characters--such as Arabic, Hebrew, Hindi, Thai, and Vietnamese--might not be in FCD form.

When normalization checking is enabled, RWUCollator checks input strings and normalizes them if necessary. When normalization checking is disabled, it skips the normalization check, improving performance.

The default value for the normalization check attribute is based upon locale. For example, normalization checking is enabled by default for Thai. If you know, however, that your Thai input strings are in FCD form, you can increase performance by disabling normalization checking.

The isEnabledNormalizationChecking() method returns true if normalization checking is enabled; otherwise, false.

For more information on normalization, see Chapter 5.