Customizing a Collator
Class RWUCollator follows the Unicode Collation Algorithm, as described in Unicode Technical Standard #10:
http://www.unicode.org/unicode/reports/tr10/
Conceptually, this algorithm works as follows:
1. For each string, find its collation elements. A collation element usually, but not necessarily, corresponds to a character. A composed character such as á, for example, corresponds to multiple collation elements: one for the letter a and one for the acute accent symbol. In contrast, traditional Spanish regards ch as a single character, so in this locale two Unicode code points correspond to a single collation element.
2. For each collation element, find its collation weights. Each collation element has at least three, and sometimes four, weights. Each weight gives another level of collation information for that collation element. The exact meaning of a collation level depends upon locale. For most locales:
*the primary weight encodes basic character identity
*the secondary weight encodes diacritical information
*the tertiary weight encodes differences in appearance, such as case
For example, a and A have the same primary weight and are considered identical at the primary level, while the primary weight for a is less than that of b so a < b. At the secondary level, the weights for a and á differ. At the tertiary level, the weights for a and A differ.
3. Compare the primary weights of two strings. If the strings can be distinguished at the primary level, the collation is complete and the result can be returned.
4. If the strings are identical at the primary level, continue comparing weights of additional levels as requested until a difference is found or the strings are determined to be equivalent.
RWUCollator provides a variety of mutator methods for customizing how collation is performed. With these methods, you can specify:
*how collation elements are found
*how collation weights are formed
*which collation levels should be considered significant