Performs locale-sensitive string comparison for use in searching and sorting natural language text. More...

#include <rw/i18n/RWUCollator.h>

Public Types
enum	CaseOrder { Normal , LowerFirst , UpperFirst }

enum	CollationStrength { Primary , Secondary , Tertiary , Quaternary , Identical }

Public Member Functions
	RWUCollator ()

	RWUCollator (const RWUCollator &original)

	RWUCollator (const RWULocale &locale)

	~RWUCollator (void)

int	compareTo (const RWUString &lhs, const RWUString &rhs) const

void	enableCaseLevel (bool caseLevel)

void	enableFrenchCollation (bool frenchCollation)

void	enableNormalizationChecking (bool check)

void	enablePunctuationShifting (bool shift)

bool	equals (const RWUString &lhs, const RWUString &rhs) const

CaseOrder	getCaseOrder (void) const

RWUCollationKey	getCollationKey (const RWUString &str) const

RWULocale	getLocale (void) const

CollationStrength	getStrength (void) const

bool	isEnabledCaseLevel (void) const

bool	isEnabledFrenchCollation (void) const

bool	isEnabledNormalizationChecking (void) const

bool	isEnabledPunctuationShifting (void) const

RWUCollator &	operator= (const RWUCollator &rhs)

void	setCaseOrder (CaseOrder order)

void	setStrength (CollationStrength strength)

Detailed Description

RWUCollator performs locale-sensitive string comparison for use in searching and sorting natural language text.

Each language has its own rules for determining the proper collation order for strings. For example, in Lithuanian, the letter y appears between i and k in the alphabet. In order to take language-specific conventions into account, each RWUCollator is associated with an RWULocale at construction time. This locale specifies the default values for a variety of RWUCollator attributes. Many of these default values can be overridden using attribute mutator methods.

RWUCollator follows the Unicode Collation Algorithm, as described in Unicode Technical Standard #10:

http://www.unicode.org/reports/tr10/.

This collation algorithm can be customized using the attribute mutator methods of the RWUCollator class. With these methods, you can specify how collation elements are found, how collation weights are formed, and which collation levels should be considered significant. See the Internationalization Module User's Guide for more information on collation.

RWUCollator calculates collation weights incrementally. This ensures good performance, as most strings differ in their first few characters. However, if string comparisons are to be made repeatedly (for example, when sorting a set of strings), then best performance can be achieved by obtaining an RWUCollationKey for each string and comparing the keys. Generating a key via RWUCollator::getCollationKey() is a non-trivial operation, as it involves determining the collation elements and weights for an entire string. Comparing two RWUCollationKey objects, however, is fast.

Example: #include <rw/i18n/RWUCollator.h>

#include <rw/i18n/RWUConversionContext.h>

#include <iostream>

using std::cout;

using std::endl;

int main() {

// Indicate string literals are encoded according to

// ISO-8859-1.

RWUConversionContext context("ISO-8859-1");

// Use implicit conversion to build two strings.

RWUString string1("Blackbird");

RWUString string2("black-bird");

// Create a collator based on the "en" locale.

RWULocale en(RWCString("en"));

RWUCollator collator(en);

// Modify the collator so it ignores differences

// in punctuation and case.

collator.enablePunctuationShifting(true);

collator.setStrength(RWUCollator::Secondary);

// Compare the two strings.

int retval = collator.compareTo(string1, string2);

if (retval < 0) {

cout << "string1 is less than string2" << endl;

} else if (retval == 0) {

cout << "string1 is equal to string2" << endl;

} else {

cout << "string1 is greater than string2" << endl;

}

return 0;

}

RWCString
Offers powerful and convenient facilities for manipulating strings.
Definition stdcstring.h:826

RWUCollator
Performs locale-sensitive string comparison for use in searching and sorting natural language text.
Definition RWUCollator.h:297

RWUCollator::Secondary
@ Secondary
Definition RWUCollator.h:319

RWUConversionContext
Specifies the default character encoding scheme for conversions between narrow character strings and ...
Definition RWUConversionContext.h:101

RWULocale
Defines a specific language, country, and variant.
Definition RWULocale.h:104

RWUString
Stores and manipulates Unicode character sequences encoded as UTF-16 code units.
Definition RWUString.h:187

Program output:

string1 is equal to string2

See also: RWUCollationKey, RWUNormalizer

Member Enumeration Documentation

◆ CaseOrder

enum RWUCollator::CaseOrder

A CaseOrder value determines how characters are ordered at the tertiary level or, if enabled, the case level.

Enumerator
Normal	characters are ordered in accordance with the Unicode Collation Charts. Typically, the lowercase version of a letter is ordered before all other versions.
LowerFirst	lowercase letters, small kana, and uncased characters are ordered before mixed-case letters. Uppercase letters are ordered last.
UpperFirst	uppercase letters are ordered before mixed-case letters. Lowercase letters, small kana, and uncased characters are ordered last.

◆ CollationStrength

enum RWUCollator::CollationStrength

A CollationStrength value indicates the level at which two collation elements should be considered equal.

Enumerator
Primary	only primary differences are considered significant. Primary differences are locale-dependent, but are typically differences in basic character identity. An example of a primary difference is `a` `!=` `b`.
Secondary	both primary and secondary differences are considered significant. Secondary differences are locale-dependent, but are typically differences in diacritics. An example of a secondary difference is `a` `!=` `á`.
Tertiary	primary, secondary, and tertiary differences are considered significant. Tertiary differences are locale-dependent, but are typically differences in appearance, such as the differences between uppercase, lowercase, superscript, subscript, halfwidth, and circled versions of a character. An example of a tertiary difference is `a` `!=` `A`.
Quaternary	primary, secondary, tertiary, and quaternary differences are considered significant. Quaternary strength is useful only in two situations: When punctuation shifting is enabled, whitespace and punctuation characters are ignored at the first three strength levels, and are distinguished at the quaternary level. For Japanese locales, hiragana characters are positioned before katakana characters at the quaternary level, mimicking JIS sort order.
Identical	all differences are considered significant. This strength level should be used sparingly. It rarely distinguishes between strings considered equal at the quaternary level, yet enacts a significant performance cost.

Constructor & Destructor Documentation

◆ RWUCollator() [1/3]

RWUCollator::RWUCollator ( )

inline

Constructs a new RWUCollator with the default locale. Throws RWUException if any error occurs during the construction.

◆ RWUCollator() [2/3]

RWUCollator::RWUCollator ( const RWULocale & locale )

explicit

Constructs a new RWUCollator based on the given locale. Throws RWUException if any error occurs during the construction.

◆ RWUCollator() [3/3]

RWUCollator::RWUCollator ( const RWUCollator & original )

Copy constructor. Makes self a deep copy of original. Throws RWUException if any error occurs during the construction.

◆ ~RWUCollator()

RWUCollator::~RWUCollator ( void )

inline

Destructor.

Member Function Documentation

◆ compareTo()

int RWUCollator::compareTo	(	const RWUString &	lhs,
		const RWUString &	rhs ) const

Compares the given strings, according to the dictates of this collator's attributes. Returns -1 if lhs < rhs, 0 if lhs == rhs, and 1 if lhs > rhs.

◆ enableCaseLevel()

void RWUCollator::enableCaseLevel ( bool caseLevel )

Sets whether case distinctions should be made at an extra "case level," positioned between the secondary and tertiary levels:

If self's strength is Primary, base character identity is taken into consideration, then case distinctions are made. Diacritics are not taken into account.
If self's strength is Secondary, base character identity, diacritics, and case distinctions are taken into account, in that order. Other tertiary distinctions, such as those between regular and superscript versions of a character, are not taken into account.
If self's strength is Tertiary, base character identity, diacritics, case distinctions, and other tertiary distinctions are taken into account, in that order.

At the case level, cased characters are ordered according to self's CaseOrder attribute.

◆ enableFrenchCollation()

void RWUCollator::enableFrenchCollation ( bool frenchCollation )

Sets whether French collation rules should be in effect for self.

When French collation rules are in effect, the diacritical differences at the secondary strength level are compared in reverse order, from the end of each string to its start.

◆ enableNormalizationChecking()

void RWUCollator::enableNormalizationChecking ( bool check )

Sets whether self should perform normalization checks on input strings.

When normalization checking is disabled, self correctly compares strings that are in FCD (Fast C or D) form–that is, strings whose raw, recursive decomposition (without reordering of diacritics) results in a canonically-ordered string. Most strings in many languages are in FCD form.

In contrast, normalization checking is enabled by default for languages that use multiple combining characters, such as Arabic, Hebrew, Hindi, Thai, and Vietnamese. This ensures that input strings are normalized if necessary before collation. If, however, you know your strings are already in FCD form, you can improve performance slightly by disabling normalization checking.

◆ enablePunctuationShifting()

void RWUCollator::enablePunctuationShifting ( bool shift )

Sets whether the significance of punctuation and whitespace characters should be shifted from the primary strength level to the quaternary strength level.

◆ equals()

bool RWUCollator::equals	(	const RWUString &	lhs,
		const RWUString &	rhs ) const

Compares the given strings, according to the dictates of this collator's attributes. Returns true if lhs == rhs. Otherwise, returns false.

◆ getCaseOrder()

CaseOrder RWUCollator::getCaseOrder ( void ) const

Returns the current CaseOrder for self.

◆ getCollationKey()

RWUCollationKey RWUCollator::getCollationKey ( const RWUString & str ) const

Returns an RWUCollationKey corresponding to the given string str. This key may be compared to other keys produced by collators with the same attributes.

◆ getLocale()

RWULocale RWUCollator::getLocale ( void ) const

inline

Returns the locale associated with self.

◆ getStrength()

RWUCollator::CollationStrength RWUCollator::getStrength ( void ) const

inline

Returns the CollationStrength associated with self.

◆ isEnabledCaseLevel()

bool RWUCollator::isEnabledCaseLevel ( void ) const

Returns true if the case level is enabled. Otherwise, returns false.

◆ isEnabledFrenchCollation()

bool RWUCollator::isEnabledFrenchCollation ( void ) const

Returns true if French collation rules are in effect. Otherwise, returns false.

◆ isEnabledNormalizationChecking()

bool RWUCollator::isEnabledNormalizationChecking ( void ) const

Returns true if normalization checking is enabled. Otherwise, returns false.

◆ isEnabledPunctuationShifting()

bool RWUCollator::isEnabledPunctuationShifting ( void ) const

Returns true if punctuation shifting is enabled. Otherwise, returns false.

◆ operator=()

RWUCollator & RWUCollator::operator= ( const RWUCollator & rhs )

Assignment operator. Makes self a deep copy of rhs. Throws RWUException if any error occurs during the construction.

◆ setCaseOrder()

void RWUCollator::setCaseOrder ( CaseOrder order )

Sets the case ordering for self to order.

◆ setStrength()

void RWUCollator::setStrength ( CollationStrength strength )

inline

Sets the collation strength of self to strength.

SourcePro® API Reference Guide

Public Types

Public Member Functions

Detailed Description

Member Enumeration Documentation

◆ CaseOrder

◆ CollationStrength

Constructor & Destructor Documentation

◆ RWUCollator() [1/3]

◆ RWUCollator() [2/3]

◆ RWUCollator() [3/3]

◆ ~RWUCollator()

Member Function Documentation

◆ compareTo()

◆ enableCaseLevel()

◆ enableFrenchCollation()

◆ enableNormalizationChecking()

◆ enablePunctuationShifting()

◆ equals()

◆ getCaseOrder()

◆ getCollationKey()

◆ getLocale()

◆ getStrength()

◆ isEnabledCaseLevel()

◆ isEnabledFrenchCollation()

◆ isEnabledNormalizationChecking()

◆ isEnabledPunctuationShifting()

◆ operator=()

◆ setCaseOrder()

◆ setStrength()