Boundary Analysis
RWUBreakSearch finds the locations of code point, character, word, sentence, and line breaks in text.
Code point breaks occur before and after each code point.
Character breaks occur between characters, as defined from the end user's perspective. For example, an accented character can be represented by a single code point or by a pair of code points (one for the base character and another for the accent symbol). Character breaks occur on either side of this logical character, regardless of the number of code points used to represent it. An
RWUBreakSearch that searches for character breaks can be used to iterate over the logical characters in a string.
Word breaks occur before and after each word. They do not occur before and after punctuation contained within a word—such as a hyphen or an apostrophe—but they do occur before and after characters that are not part of a word, such as symbols and other punctuation marks. Note that, in some languages, words are not necessarily surrounded by whitespace. An
RWUBreakSearch that searches for word breaks is useful in the creation of operations that find whole words.
Sentence breaks occur between sentences.
RWUBreakSearch attempts to interpret correctly nested quotes, nested parentheses, and periods that may either end a sentence or be part of a number or abbreviation. This is a difficult task, however, and the results may not always be perfect. An
RWUBreakSearch that searches for sentence breaks could be used to count the sentences in a string.
Line breaks occur at positions where it would be appropriate to wrap text from one display line to the next. An
RWUBreakSearch that searches for line breaks is useful in the creation of line-wrapping algorithms.
Instances of
RWUBreakSearch are used by other classes in the Internationalization Module to find breaks in text in a locale-sensitive manner. For example,
RWUStringSearch performs flexible, collation-based string searches, using the rules encapsulated by an
RWUCollator and an optional
RWUBreakSearch to determine if and where a match occurs (
Locale-Sensitive String Searching). Similarly,
RWURegularExpression uses an
RWUBreakSearch internally to find break-related matches (
Regular Expression String Searching).