|
| RWUStringSearch (const RWUString &pattern, const RWUString &text, const RWUCollator &collator) |
|
| RWUStringSearch (const RWUString &pattern, const RWUString &text, const RWUCollator &collator, const RWUBreakSearch &breakSearch) |
|
| RWUStringSearch (const RWUString &pattern, const RWUString &text, const RWULocale &locale=RWULocale::getDefault(), RWUBreakSearch::BreakType breakType=RWUBreakSearch::CodePoint) |
|
| RWUStringSearch (const RWUStringSearch &source) |
|
| ~RWUStringSearch () |
|
void | clearBreakSearch (void) |
|
RWUConstStringIterator | current (void) const |
|
RWUConstStringIterator | first (void) |
|
RWUBreakSearch::BreakType | getBreakType () const |
|
const RWUCollator & | getCollator (void) const |
|
RWUConstSubString | getMatch (void) const |
|
size_t | getMatchLength (void) const |
|
RWUConstStringIterator | getMatchStart (void) const |
|
RWUString | getPattern (void) const |
|
const RWUString & | getString (void) const |
|
bool | isMatch (const RWUConstStringIterator &position) |
|
RWUConstStringIterator | last (void) |
|
RWUConstStringIterator | next (const RWUConstStringIterator &position) |
|
RWUConstStringIterator | next (void) |
|
RWUStringSearch & | operator= (const RWUStringSearch &rhs) |
|
RWUConstStringIterator | previous (const RWUConstStringIterator &position) |
|
RWUConstStringIterator | previous (void) |
|
size_t | replace (RWUString &str, const RWUString &replacement, size_t occurrences=1) |
|
void | setBreakSearch (const RWUBreakSearch &bSearch) |
|
void | setCollator (const RWUCollator &collator) |
|
void | setPattern (const RWUString &pattern) |
|
void | setString (const RWUString &text) |
|
RWUStringSearch searches text for occurrences of a specified pattern string. The pattern string is not a pattern in the sense of a regular expression (see RWURegularExpression), but rather a string to be searched for.
RWUStringSearch allows for flexible, collation-based string searches, unlike searches performed by RWUString::index() and RWUString::subString(). RWUString uses simple bit-wise comparisons of the code units in the strings, but RWUStringSearch employs the rules encapsulated by an RWUCollator and an optional RWUBreakSearch to determine if and where a match occurs.
RWUStringSearch provides a number of options to search for occurrences of the pattern string in a text string:
With iterator-style searches, RWUStringSearch, like RWUBreakSearch, maintains a "current" position within the source string. A call to first() or last() sets the current position to the code unit offset just past that of the first or last match, respectively, and returns the location of the beginning of the match. Method next() advances the current position to the code unit offset immediately following that of the next match, and returns the location of the new match. Method previous() moves the current position to the beginning of the previous non-overlapping match, and returns the location of the new match.
- Example
#include <rw/i18n/RWUCollator.h>
#include <rw/i18n/RWUConversionContext.h>
#include <rw/i18n/RWUStringSearch.h>
#include <iostream>
using std::cout;
using std::endl;
int main() {
"Utf8 serializes a Unicode code point "
"as a sequence of one to four bytes. Table 3-1 of "
"The Unicode Standard shows the bit distribution used "
"in utf-8.");
collator.enablePunctuationShifting(true);
int count = 0;
while (searcher.next() != text.endCodePointIterator()) {
++count;
}
cout << "Pattern was found " << count << " times." << endl;
return 0;
}
Performs locale-sensitive string comparison for use in searching and sorting natural language text.
Definition RWUCollator.h:296
@ Primary
Definition RWUCollator.h:310
Specifies the default character encoding scheme for conversions between narrow character strings and ...
Definition RWUConversionContext.h:101
Searches text for occurrences of a specified Unicode string.
Definition RWUStringSearch.h:125
Stores and manipulates Unicode character sequences encoded as UTF-16 code units.
Definition RWUString.h:187
Program output:
Pattern was found 2 times.
- See also
- RWUString, RWURegularExpression, RWUBreakSearch
Constructs an RWUStringSearch that searches for occurrences of pattern in text, using the string comparison rules encapsulated by collator. A substring is considered a match only if it falls on boundaries returned by breakSearch. This makes it possible, for example, to search for entire words or entire sentences.
A distinct (deep) copy is made of the pattern string, collator, and breakSearch, but only a reference to the text string is stored.
- Exceptions
-
Finds the position of the match that appears fully before the specified position. Sets self's new current position to that of the match, and returns the location of the match.
Note that only the nearest match that appears entirely before the specified position is returned. For example, assume that the pattern is the
, the search string is thethe
, and the current position is 4. Although a match occurs at position 3, the nearest offset prior to offset 4 at which an entire match can be found is position 0. Therefore, position 0 is returned.
If no match is found, sets self's current position to the end of the string, and returns the source string's end iterator. This method is intended to be used for backward iteration over a set of breaks in a string.
- Exceptions
-
Finds the position of the match that appears fully before the current position. Sets self's new current position to that of the match, and returns the location of the match.
Note that only the nearest match that appears entirely before the specified position is returned. For example, assume that the pattern is the
, the search string is thethe
, and the current position is 4. Although a match occurs at position 3, the nearest offset prior to offset 4 at which an entire match can be found is position 0. Therefore, position 0 is returned.
If no match is found, sets self's current position to the end of the string, and returns the source string's end iterator. This method is intended to be used for backward iteration over a set of breaks in a string.
- Exceptions
-