SourcePro Core : Internationalization Module User’s Guide : Boundary Analysis and Tokenizing
Boundary Analysis and Tokenizing
Overview
The Internationalization Module contains two classes for finding delimiters in Unicode strings:
*RWUBreakSearch finds the locations of breaks in text. This class correctly interprets whitespace and punctuation based on a specific locale.
*RWUTokenizer finds delimiters, and sequentially returns the tokens between those delimiters. By default, RWUTokenizer uses a predefined set of whitespace characters as delimiters. Optionally, it uses a specified set of arbitrary characters or a regular expression. Using a regular expression as a token delimiter permits complex, multicharacter delimiters.
This chapter describes how to use these classes.