SourcePro® API Reference Guide

 
Loading...
Searching...
No Matches
RWWTokenizer Class Reference

Breaks up a string into separate tokens, delimited by arbitrary whitespace. Can be used as an alternative to the C++ Standard Library function std::wcstok(). More...

#include <rw/wtoken.h>

Public Member Functions

 RWWTokenizer (const RWWString &s)
 
 RWWTokenizer (const RWWTokenizer &rhs)
 
 RWWTokenizer (RWWTokenizer &&rhs)
 
bool done () const
 
RWWSubString nextToken (RWTRegex< wchar_t > &regex)
 
RWWSubString operator() ()
 
RWWSubString operator() (const wchar_t *s)
 
RWWSubString operator() (const wchar_t *s, size_t num)
 
RWWSubString operator() (RWTRegex< wchar_t > &regex)
 
RWWTokenizeroperator= (const RWWTokenizer &rhs)
 
RWWTokenizeroperator= (RWWTokenizer &&rhs)
 
void swap (RWWTokenizer &rhs)
 

Detailed Description

Class RWWTokenizer is designed to break a string up into separate tokens, delimited by arbitrary whitespace. It can be thought of as an iterator for strings and as an alternative to the C++ Standard Library function std::wcstok() which has the unfortunate side effect of changing the string being tokenized.

Synopsis
#include <rw/wtoken.h>
RWWString str("a string of tokens", RWWString::ascii);
RWWTokenizer(str); // Lex the above string
Offers powerful and convenient facilities for manipulating wide character strings.
Definition stdwstring.h:784
@ ascii
Definition stdwstring.h:855
RWWTokenizer(const RWWString &s)
Persistence
None
Example
#include <rw/wtoken.h>
int main() {
RWWString a(L"Something is rotten in the state of Denmark");
RWWTokenizer next(a); // Tokenize the string a
RWWString token; // Will receive each token
// Advance until the null string is returned:
while (!(token = next()).isNull()) {
std::cout << token << "\n";
}
return 0;
}
Breaks up a string into separate tokens, delimited by arbitrary whitespace. Can be used as an alterna...
Definition wtoken.h:83

Program output (assuming your platform displays wide characters as US-ASCII if they are in the US-ASCII character set):

Something
is
rotten
in
the
state
of
Denmark

Constructor & Destructor Documentation

◆ RWWTokenizer() [1/3]

RWWTokenizer::RWWTokenizer ( const RWWString & s)

Constructs a tokenizer to lex the string s.

◆ RWWTokenizer() [2/3]

RWWTokenizer::RWWTokenizer ( const RWWTokenizer & rhs)
inline

Copy constructor. The created tokenizer copies the data from rhs.

◆ RWWTokenizer() [3/3]

RWWTokenizer::RWWTokenizer ( RWWTokenizer && rhs)
inline

Move constructor. The constructed instance takes ownership of the data owned by rhs.

Condition:
This method is available only on platforms with rvalue reference support.

Member Function Documentation

◆ done()

bool RWWTokenizer::done ( ) const

Returns true if the last token from the search string has been extracted, otherwise false. When using the function call operator interface, this is the same as the last non-empty token having been returned.

◆ nextToken()

RWWSubString RWWTokenizer::nextToken ( RWTRegex< wchar_t > & regex)

Returns the next token using a delimiter pattern represented by a regular expression pattern.

Unlike the other nextToken() overloads, this method allows a single occurrence of a delimiter to span multiple characters.

For example, nextToken(RWWString("ab")) treats either a or b as a delimiter character. Conversely, nextToken(RWTRegex<wchar_t>("ab")) treats the two-character pattern ab as a single delimiter.

This method may return an empty token if there are consecutive occurrences of any delimiter character in the search string.

◆ operator()() [1/4]

RWWSubString RWWTokenizer::operator() ( )

Advances to the next token and returns it as a substring. The tokens are delimited by any of the four wide characters in L, " \t\n\0" (space, tab, newline and null).

◆ operator()() [2/4]

RWWSubString RWWTokenizer::operator() ( const wchar_t * s)

Advances to the next token and returns it as a wide substring. The tokens are delimited by any wide character in s, or any embedded wide null.

◆ operator()() [3/4]

RWWSubString RWWTokenizer::operator() ( const wchar_t * s,
size_t num )

Advances to the next token and returns it as a substring. The tokens are delimited by any of the first num wide characters in s. Buffer s may contain embedded nulls, and must contain at least num wide characters. Tokens will not be delimited by nulls unless s contains nulls.

◆ operator()() [4/4]

RWWSubString RWWTokenizer::operator() ( RWTRegex< wchar_t > & regex)

Returns the next token using a delimiter pattern represented by the regular expression pattern regex.

This method, unlike the other operator() overloads, allows a single occurrence of a delimiter to span multiple characters.

For example, consider the RWWTokenizer instance tok. The statement tok(RWWString("ab")) treats either a or b as a delimiter character. On the other hand, tok(RWTRegex<wchar_t>("ab")) treats the two-character pattern, ab, as a single delimiter.

This method consumes consecutive occurrences of delimiters and skips over any empty fields that may be present in the string. To obtain empty fields as well as non-empty fields, use the nextToken(RWTRegex<wchar_t>&) method.

◆ operator=() [1/2]

RWWTokenizer & RWWTokenizer::operator= ( const RWWTokenizer & rhs)
inline

Assignment operator. The tokenizer copies the data from rhs. Returns a reference to self.

◆ operator=() [2/2]

RWWTokenizer & RWWTokenizer::operator= ( RWWTokenizer && rhs)
inline

Move assignment. Self takes ownership of the data owned by rhs.

Condition:
This method is available only on platforms with rvalue reference support.

◆ swap()

void RWWTokenizer::swap ( RWWTokenizer & rhs)
inline

Swaps the data owned by self with the data owned by rhs.

Copyright © 2024 Rogue Wave Software, Inc., a Perforce company. All Rights Reserved.