Finds the locations of breaks, or potential breaks, in text for a specified locale. More...
#include <rw/i18n/RWUBreakSearch.h>
Public Types | |
enum | BreakType { CodePoint, Character, Word, Line, Sentence } |
RWUBreakSearch finds the locations of breaks, or potential breaks, in text. Whitespace and punctuation are correctly interpreted in accordance with a specified locale.
Breaks reported from RWUBreakSearch are located immediately prior to the reported location. For example, a character break reported at offset 0
occurs just before the first character. RWUConstStringIterator instances returned by member functions of RWUBreakSearch are positioned at the code point immediately following a break.
Five types of text breaks are supported by RWUBreakSearch:
á
can be represented by a single code point or by a pair of code points (one for the letter a
and another for the acute symbol). An RWUBreakSearch that searches for character breaks treats á
as a single character, regardless of the number of code points used to represent it. You could use an RWUBreakSearch that searches for character breaks to iterate over the logical characters in a string.RWUBreakSearch objects are created given the break type to search for, an RWUString which provides text for processing, and an optional locale name. If no locale is specified, then the current default locale is used.
After instantiating a break search, you can search for the specified break type using the first(), last(), next(), and previous() methods. RWUBreakSearch objects maintain a "current" position. Initially, the current position is the start of the source string. Calls to first(), last(), next(), and previous() alter the current position.
Note that breaks occur both before and after each unit being queried. This is true for all types of break searches. For example, there are a total of four character breaks in the string abc
. There is a break before the a
, before the b
, before the c
, and after the c
. This may require special handling of the ends of strings, which are always break locations. Consider the following loop:
If the character break that is located at the str.endCodePointIterator() position – like the break after the c
above – should be processed, then you must take care to process it outside the body of the loop.
The following example counts the number of sentences in a string:
Program output:
Specifies the type of breaks for which an RWUBreakSearch should search.
Enumerator | |
---|---|
CodePoint |
breaks occur before and after each code point in a string. |
Character |
breaks occur before and after logical characters in a string. |
Word |
breaks occur before and after each word. |
Line |
breaks occur at positions where it would be appropriate to wrap text from one display line to the next. |
Sentence |
breaks occur before and after sentences.
|
RWUBreakSearch::RWUBreakSearch | ( | BreakType | type, |
const RWUString & | str, | ||
const RWULocale & | locale = RWULocale::getDefault() |
||
) |
Creates an RWUBreakSearch that searches for breaks of type type within str, interpreting punctuation and whitespace in accordance with the given locale. If no locale is specified, then the current default locale is used.
RWUException | Thrown if any error occurs during the construction of the break search. |
RWUBreakSearch::RWUBreakSearch | ( | const RWUBreakSearch & | source | ) |
Creates a copy of the specified source RWUBreakSearch object. The RWUString referenced by self is the same RWUString referenced by source. The current position of self is the same as the position of source.
RWUException | Thrown if any error occurs during the construction of the break search. |
RWUBreakSearch::~RWUBreakSearch | ( | ) |
Destructor.
RWUConstStringIterator RWUBreakSearch::back | ( | void | ) | const |
Returns the position of the last break in self's string, without changing the current position of self.
RWUConstStringIterator RWUBreakSearch::current | ( | void | ) | const |
Returns the current position maintained by self.
RWUConstStringIterator RWUBreakSearch::first | ( | void | ) |
Sets the current position to the first break in self's string, and returns the new position.
RWUConstStringIterator RWUBreakSearch::front | ( | void | ) | const |
Returns the position of the first break in self's string, without changing the current position of self.
RWCString RWUBreakSearch::getLocale | ( | void | ) | const |
Returns the name of the locale currently imbued on self.
const RWUString& RWUBreakSearch::getString | ( | void | ) | const |
Returns a const reference to the RWUString associated with self.
BreakType RWUBreakSearch::getType | ( | void | ) | const |
Returns the break type searched for by self.
bool RWUBreakSearch::isBreak | ( | const RWUConstStringIterator & | position | ) | const |
Returns true
if the given string position is a break; otherwise, false
.
RWUException | Thrown with error code RWUUnsupportedError if position does not reference the same string as self. |
bool RWUBreakSearch::isBreak | ( | const RWUStringIterator & | position | ) | const |
Returns true
if the given string position is a break; otherwise, false
.
RWUException | Thrown with error code RWUUnsupportedError if position does not reference the same string as self. |
bool RWUBreakSearch::isBreak | ( | size_t | offset | ) | const |
Returns true
if the position at the given code unit offset is a break; otherwise, false
.
RWUConstStringIterator RWUBreakSearch::last | ( | void | ) |
Sets the current position to the last break in self's string, and returns the new position.
RWUConstStringIterator RWUBreakSearch::next | ( | void | ) |
Finds the position of the next break after the current position. Makes that position self's new current position, and returns the new position. If self is already positioned at the end of its string, the current position remains at the end of the string.
RWUConstStringIterator RWUBreakSearch::next | ( | const RWUStringIterator & | position | ) |
Changes the current position of self to the next break after the specified position, and returns the new position.
RWUConstStringIterator RWUBreakSearch::next | ( | const RWUConstStringIterator & | position | ) |
Changes the current position of self to the next break after the specified position, and returns the new position.
RWUBreakSearch& RWUBreakSearch::operator= | ( | const RWUBreakSearch & | rhs | ) |
Assignment operator. Creates a copy of the rhs RWUBreakSearch object. The RWUString referenced by self is the same RWUString referenced by rhs. The current position of self is the same as the position of rhs.
RWUConstStringIterator RWUBreakSearch::previous | ( | void | ) |
Changes the current position of self to the break prior to the current position, and returns the new position. If self is already positioned at the beginning of its string, the current position remains at the beginning of the string.
RWUConstStringIterator RWUBreakSearch::previous | ( | const RWUStringIterator & | position | ) |
Changes the current position of self to the break prior to the specified position, and returns the new position.
RWUConstStringIterator RWUBreakSearch::previous | ( | const RWUConstStringIterator & | position | ) |
Changes the current position of self to the break prior to the specified position, and returns the new position.
void RWUBreakSearch::setLocale | ( | const RWULocale & | locale | ) |
Imbues a locale on self.
void RWUBreakSearch::setString | ( | const RWUString & | str | ) |
Sets the RWUString in which self searches for breaks to str. Resets the current position of self to the start of the search string.
Only a reference to the input RWUString is held. Consequently, you must take care not to allow the string referenced by self to be changed before the last use of self. Destroying or changing the RWUString referenced by an RWUBreakSearch object invalidates that RWUBreakSearch object.
void RWUBreakSearch::setType | ( | BreakType | type | ) |
Sets the break type searched for by self to type. Resets the current position of self to the start of the search string.
Copyright © 2021 Rogue Wave Software, Inc., a Perforce company. All Rights Reserved. |