Using an RWBreakSearch
Once a break search is instantiated, breaks can be queried using first(), last(), next(), and previous() methods. An RWUBreakSearch object maintains a current position. Initially, the current position is the start of the source string. Calls to first(), last(), next(), and previous() alter the current position.
NOTE: Breaks are interpreted as being between characters, immediately to the left of the current position.
For example, the following code counts the number of sentences in a string:
 
RWUConversionContext context("UTF-8"); //1
 
RWUString str("Unicode 3.2 is a minor version of the " //2
"Unicode Standard. It overrides certain features of "
"Unicode 3.1, and adds a significant number of coded "
"characters.");
 
RWUBreakSearch searcher(RWUBreakSearch::Sentence, str); //3
 
RWUConstStringIterator iter = str.beginCodeIterator(); //4
RWUConstStringIterator end = str.endCodePointIterator(); //5
 
int count = 0;
while (iter != end) {
++count; //6
iter = searcher.next();
} // while
std::cout << "Found " << count << " sentences." << std::endl;
//1 Indicates that source and target strings are encoded as UTF-8.
//2 Initializes a Unicode string.
//3 Creates an RWUBreakSearch capable of finding sentence breaks, based on the default locale.
//4 Finds the beginning of the first sentence.
//5 Finds the end of the last sentence.
//6 Counts the sentences in the string.
Note that for all types of break searches, breaks often occur both before and after each unit being queried. For example, there are a total of four character breaks in the string abc. There is a break before the a, before the b, before the c, and after the c. This may require special handling of the ends of strings. For example, consider the following loop:
 
RWUString str;
RWUBreakSearch searcher(RWUBreakSearch::Character, str);
RWUConstStringIterator it;
for (it = searcher.first();
it != str.endCodePointIterator();
it = searcher.next())
{...}
If the character break that is located at the str.endCodePointIterator() position (like the break after the c above) should be processed, then you must take care to process it outside the body of the loop.