






 
A piece of text can sometimes be represented by more than one sequence of Unicode characters. This is because the Unicode standard recognizes two types of character equivalence, in which different Unicode code points or sequences of code points are considered equivalent forms of the same information. The two types of character equivalence give rise to four normalization forms. Each normalization form produces a unique representation for a given string.
Normalization is the process of converting Unicode text to a unique representation. Normalization facilitates sorting, searching, conversion, and data exchange. The W3C recommends that all data be normalized as early as possible.
In the Internationalization Module, class RWUNormalizer normalizes Unicode text. This chapter describes how to use RWUNormalizer to:
convert a string into a particular normalization form
detect whether a string is already in a particular form





Copyright © Rogue Wave Software, Inc. All Rights Reserved.
The Rogue Wave name and logo, and SourcePro, are registered trademarks of Rogue Wave Software. All other trademarks are the property of their respective owners.
Provide feedback to Rogue Wave about its documentation.