SourcePro® API Reference Guide

 
Loading...
Searching...
No Matches

Finds the locations of breaks, or potential breaks, in text for a specified locale. More...

#include <rw/i18n/RWUBreakSearch.h>

Public Types

enum  BreakType {
  CodePoint , Character , Word , Line ,
  Sentence
}
 

Public Member Functions

 RWUBreakSearch (BreakType type, const RWUString &str, const RWULocale &locale=RWULocale::getDefault())
 
 RWUBreakSearch (const RWUBreakSearch &source)
 
 ~RWUBreakSearch ()
 
RWUConstStringIterator back (void) const
 
RWUConstStringIterator current (void) const
 
RWUConstStringIterator first (void)
 
RWUConstStringIterator front (void) const
 
RWCString getLocale (void) const
 
const RWUStringgetString (void) const
 
BreakType getType (void) const
 
bool isBreak (const RWUConstStringIterator &position) const
 
bool isBreak (const RWUStringIterator &position) const
 
bool isBreak (size_t offset) const
 
RWUConstStringIterator last (void)
 
RWUConstStringIterator next (const RWUConstStringIterator &position)
 
RWUConstStringIterator next (const RWUStringIterator &position)
 
RWUConstStringIterator next (void)
 
RWUBreakSearchoperator= (const RWUBreakSearch &rhs)
 
RWUConstStringIterator previous (const RWUConstStringIterator &position)
 
RWUConstStringIterator previous (const RWUStringIterator &position)
 
RWUConstStringIterator previous (void)
 
void setLocale (const RWULocale &locale)
 
void setString (const RWUString &str)
 
void setType (BreakType type)
 

Friends

class RWUStringSearch
 

Detailed Description

RWUBreakSearch finds the locations of breaks, or potential breaks, in text. Whitespace and punctuation are correctly interpreted in accordance with a specified locale.

Breaks reported from RWUBreakSearch are located immediately prior to the reported location. For example, a character break reported at offset 0 occurs just before the first character. RWUConstStringIterator instances returned by member functions of RWUBreakSearch are positioned at the code point immediately following a break.

Five types of text breaks are supported by RWUBreakSearch:

  • Code point breaks occur before and after each code point.
  • Character breaks occur between characters, as defined from the end user's perspective. For instance, á can be represented by a single code point or by a pair of code points (one for the letter a and another for the acute symbol). An RWUBreakSearch that searches for character breaks treats á as a single character, regardless of the number of code points used to represent it. You could use an RWUBreakSearch that searches for character breaks to iterate over the logical characters in a string.
  • Word breaks occur before and after each word. For example, you could use an RWUBreakSearch that searches for word breaks to create "find whole word" operations.
  • Sentence breaks occur between sentences. For example, you could use an RWUBreakSearch that searches for sentence breaks to count the sentences in a string. Note that RWUBreakSearch attempts to interpret nested quotes, nested parentheses, and periods that may either end a sentence, or be part of a number or abbreviation. This is a difficult problem, however, and the results are not guaranteed to be perfect.
  • Line breaks occur at positions where it would be appropriate to wrap text from one display line to the next. For example, you could use an RWUBreakSearch that searches for line breaks to create a line-wrapping algorithm.

RWUBreakSearch objects are created given the break type to search for, an RWUString which provides text for processing, and an optional locale name. If no locale is specified, then the current default locale is used.

After instantiating a break search, you can search for the specified break type using the first(), last(), next(), and previous() methods. RWUBreakSearch objects maintain a "current" position. Initially, the current position is the start of the source string. Calls to first(), last(), next(), and previous() alter the current position.

Note that breaks occur both before and after each unit being queried. This is true for all types of break searches. For example, there are a total of four character breaks in the string abc. There is a break before the a, before the b, before the c, and after the c. This may require special handling of the ends of strings, which are always break locations. Consider the following loop:

for (it = bSearch.first(); it != str.endCodePointIterator();
it = bSearch.next()) {
// ...
}
Finds the locations of breaks, or potential breaks, in text for a specified locale.
Definition RWUBreakSearch.h:167
@ Character
Definition RWUBreakSearch.h:183
Provides read-only access to the code points encoded by the code units within an RWBasicUString.
Definition RWUConstStringIterator.h:121
Stores and manipulates Unicode character sequences encoded as UTF-16 code units.
Definition RWUString.h:187
code_point_iterator endCodePointIterator()
Definition RWUString.h:2568

If the character break that is located at the str.endCodePointIterator() position – like the break after the c above – should be processed, then you must take care to process it outside the body of the loop.

Example

The following example counts the number of sentences in a string:

#include <rw/i18n/RWUBreakSearch.h>
#include <rw/i18n/RWUConversionContext.h>
#include <iostream>
using std::cout;
using std::endl;
int main() {
// Indicate that source and target strings are
// encoded as UTF-8.
RWUConversionContext context("UTF-8");
// Initialize a Unicode string.
RWUString str(
"Unicode 3.2 is a minor version of the "
"Unicode Standard. It overrides certain features of "
"Unicode 3.1, and adds a significant number of coded "
"characters.");
// Create an RWUBreakSearch capable of finding
// sentence breaks, based on the default locale.
// Find the beginning of the first sentence.
// Find the end of the last sentence.
// Count the sentences in the string.
int count = 0;
while (iter != end) {
++count;
iter = searcher.next();
}
cout << "Found " << count << " sentences." << endl;
return 0;
}
@ Sentence
Definition RWUBreakSearch.h:205
Specifies the default character encoding scheme for conversions between narrow character strings and ...
Definition RWUConversionContext.h:101
code_point_iterator beginCodePointIterator()
Definition RWUString.h:2559

Program output:

Found 2 sentences.

Member Enumeration Documentation

◆ BreakType

Specifies the type of breaks for which an RWUBreakSearch should search.

Enumerator
CodePoint 

breaks occur before and after each code point in a string.

Character 

breaks occur before and after logical characters in a string.

Word 

breaks occur before and after each word.

Line 

breaks occur at positions where it would be appropriate to wrap text from one display line to the next.

Sentence 

breaks occur before and after sentences.

Note
RWUBreakSearch attempts to interpret nested quotes, nested parentheses, and periods that may either end a sentence, or be part of a number or abbreviation. This is a difficult problem, however, and the results are not guaranteed to be perfect.

Constructor & Destructor Documentation

◆ RWUBreakSearch() [1/2]

RWUBreakSearch::RWUBreakSearch ( BreakType type,
const RWUString & str,
const RWULocale & locale = RWULocale::getDefault() )

Creates an RWUBreakSearch that searches for breaks of type type within str, interpreting punctuation and whitespace in accordance with the given locale. If no locale is specified, then the current default locale is used.

Note
Distinct (deep) copies of the type and locale arguments are made within the RWUBreakSearch object, but only a reference to the input RWUString is held. Consequently, you must take care not to allow the string used to create the RWUBreakSearch to be changed before the last use of that RWUBreakSearch object. Destroying or changing the RWUString referenced by an RWUBreakSearch object invalidates that RWUBreakSearch object.
Exceptions
RWUExceptionThrown if any error occurs during the construction of the break search.

◆ RWUBreakSearch() [2/2]

RWUBreakSearch::RWUBreakSearch ( const RWUBreakSearch & source)

Creates a copy of the specified source RWUBreakSearch object. The RWUString referenced by self is the same RWUString referenced by source. The current position of self is the same as the position of source.

Note
The RWUString referenced by self is the same RWUString referenced by source. Consequently, you must take care not to allow the string to be changed before the last use of self. Destroying or changing the RWUString referenced by an RWUBreakSearch object invalidates that RWUBreakSearch object.
Exceptions
RWUExceptionThrown if any error occurs during the construction of the break search.

◆ ~RWUBreakSearch()

RWUBreakSearch::~RWUBreakSearch ( )

Destructor.

Member Function Documentation

◆ back()

RWUConstStringIterator RWUBreakSearch::back ( void ) const

Returns the position of the last break in self's string, without changing the current position of self.

◆ current()

RWUConstStringIterator RWUBreakSearch::current ( void ) const

Returns the current position maintained by self.

◆ first()

RWUConstStringIterator RWUBreakSearch::first ( void )

Sets the current position to the first break in self's string, and returns the new position.

◆ front()

RWUConstStringIterator RWUBreakSearch::front ( void ) const

Returns the position of the first break in self's string, without changing the current position of self.

◆ getLocale()

RWCString RWUBreakSearch::getLocale ( void ) const

Returns the name of the locale currently imbued on self.

◆ getString()

const RWUString & RWUBreakSearch::getString ( void ) const

Returns a const reference to the RWUString associated with self.

◆ getType()

BreakType RWUBreakSearch::getType ( void ) const

Returns the break type searched for by self.

◆ isBreak() [1/3]

bool RWUBreakSearch::isBreak ( const RWUConstStringIterator & position) const

Returns true if the given string position is a break; otherwise, false.

Exceptions
RWUExceptionThrown with error code RWUUnsupportedError if position does not reference the same string as self.

◆ isBreak() [2/3]

bool RWUBreakSearch::isBreak ( const RWUStringIterator & position) const

Returns true if the given string position is a break; otherwise, false.

Exceptions
RWUExceptionThrown with error code RWUUnsupportedError if position does not reference the same string as self.

◆ isBreak() [3/3]

bool RWUBreakSearch::isBreak ( size_t offset) const

Returns true if the position at the given code unit offset is a break; otherwise, false.

◆ last()

RWUConstStringIterator RWUBreakSearch::last ( void )

Sets the current position to the last break in self's string, and returns the new position.

◆ next() [1/3]

RWUConstStringIterator RWUBreakSearch::next ( const RWUConstStringIterator & position)

Changes the current position of self to the next break after the specified position, and returns the new position.

◆ next() [2/3]

RWUConstStringIterator RWUBreakSearch::next ( const RWUStringIterator & position)

Changes the current position of self to the next break after the specified position, and returns the new position.

◆ next() [3/3]

RWUConstStringIterator RWUBreakSearch::next ( void )

Finds the position of the next break after the current position. Makes that position self's new current position, and returns the new position. If self is already positioned at the end of its string, the current position remains at the end of the string.

◆ operator=()

RWUBreakSearch & RWUBreakSearch::operator= ( const RWUBreakSearch & rhs)

Assignment operator. Creates a copy of the rhs RWUBreakSearch object. The RWUString referenced by self is the same RWUString referenced by rhs. The current position of self is the same as the position of rhs.

Note
The RWUString referenced by self is the same RWUString referenced by rhs. Consequently, you must take care not to allow the string to be changed before the last use of self. Destroying or changing the RWUString referenced by an RWUBreakSearch object invalidates that RWUBreakSearch object.

◆ previous() [1/3]

RWUConstStringIterator RWUBreakSearch::previous ( const RWUConstStringIterator & position)

Changes the current position of self to the break prior to the specified position, and returns the new position.

◆ previous() [2/3]

RWUConstStringIterator RWUBreakSearch::previous ( const RWUStringIterator & position)

Changes the current position of self to the break prior to the specified position, and returns the new position.

◆ previous() [3/3]

RWUConstStringIterator RWUBreakSearch::previous ( void )

Changes the current position of self to the break prior to the current position, and returns the new position. If self is already positioned at the beginning of its string, the current position remains at the beginning of the string.

◆ setLocale()

void RWUBreakSearch::setLocale ( const RWULocale & locale)

Imbues a locale on self.

◆ setString()

void RWUBreakSearch::setString ( const RWUString & str)

Sets the RWUString in which self searches for breaks to str. Resets the current position of self to the start of the search string.

Only a reference to the input RWUString is held. Consequently, you must take care not to allow the string referenced by self to be changed before the last use of self. Destroying or changing the RWUString referenced by an RWUBreakSearch object invalidates that RWUBreakSearch object.

◆ setType()

void RWUBreakSearch::setType ( BreakType type)

Sets the break type searched for by self to type. Resets the current position of self to the start of the search string.

Copyright © 2024 Rogue Wave Software, Inc., a Perforce company. All Rights Reserved.