SourcePro® API Reference Guide

 
List of all members | Public Types | Public Member Functions
RWTRegex< T > Class Template Reference

Supports regular expression matching based on the POSIX.2 standard and supports both narrow and wide characters. More...

#include <rw/tools/regex.h>

Public Types

typedef RWTRegexMatchIterator< T > iterator
 
typedef RWTRegexMatchIterator< T > match_iterator
 
typedef RWTRegexTraits< T >::Char RChar
 
typedef std::basic_string< RCharRString
 
enum  RWTRegexStatus {
  Ok, MissingEscapeSequence, InvalidHexNibble, InsufficientHex8Data,
  InsufficientHex16Data, MissingClosingBracket, MissingClosingCurlyBrace, MissingClosingParen,
  UnmatchedClosingParen, InvalidSubexpression, InvalidDataAfterOr, InvalidDataBeforeOr,
  ConsecutiveCardinalities, InvalidCardinalityRange, LeadingCardinality, InvalidDecimalDigit,
  UnmatchedClosingCurly, NumberOfStatusCodes
}
 

Public Member Functions

 RWTRegex ()
 
 RWTRegex (const RChar *str, size_t length=size_t(-1))
 
 RWTRegex (const RString &str, size_t length=size_t(-1))
 
 RWTRegex (const RWTRegex &source)
 
 RWTRegex (RWTRegex &&rhs)
 
virtual ~RWTRegex ()
 
const RWRegexErrgetStatus () const
 
size_t index (const RChar *str, size_t *mLen=0, size_t start=size_t(0), size_t length=size_t(-1))
 
size_t index (const RString &str, size_t *mLen=0, size_t start=size_t(0), size_t length=size_t(-1))
 
RWTRegexResult< T > matchAt (const RChar *str, size_t start=size_t(0), size_t length=size_t(-1))
 
RWTRegexResult< T > matchAt (const RString &str, size_t start=size_t(0), size_t length=size_t(-1))
 
bool operator< (const RWTRegex &rhs) const
 
RWTRegexoperator= (const RWTRegex &rhs)
 
RWTRegexoperator= (RWTRegex &&rhs)
 
bool operator== (const RWTRegex &rhs) const
 
size_t replace (RString &str, const RString &replacement, size_t count=1, size_t matchID=0, size_t start=size_t(0), size_t length=size_t(-1), bool replaceEmptyMatches=true)
 
RWTRegexResult< T > search (const RChar *str, size_t start=size_t(0), size_t length=size_t(-1))
 
RWTRegexResult< T > search (const RString &str, size_t start=size_t(0), size_t length=size_t(-1))
 
size_t subCount () const
 
void swap (RWTRegex< T > &rhs)
 

Detailed Description

template<class T>
class RWTRegex< T >

RWTRegex is the primary template for the regular expression interface. It provides most of the POSIX.2 standard for regular expression pattern matching and may be used for both narrow (8-bit) and wide (wchar_t) character strings.

RWTRegex can represent both a simple and an extended regular expression such as those found in lex and awk. The constructor "compiles" the expression into a form that can be used more efficiently. The results can then be used for string searches using class RWCString. Regular expressions (REs) can be of arbitrary size, limited by memory. The extended regular expression features found here are a subset of those found in the POSIX.2 standard (ANSI/IEEE Std. 1003.2, ISO/IEC 9945-2).

RWTRegex differs from the POSIX.2 standard in the following ways:

Constructing a regular expression

To match a single character RE

Any character that is not a special character matches itself.

  1. A backslash (\) followed by any special character matches the literal character itself; that is, its use "escapes" the special character. For example, \* matches "*" without applying the syntax of the * special character.
  2. The "special characters" are:
    + * ? . [ ] ^ $ ( ) { } | \
  3. The period (.) matches any character. For example, ".umpty" matches either "Humpty" or "Dumpty."
  4. A set of characters enclosed in brackets ([ ]) is a one-character RE that matches any of the characters in that set. This means that [akm] matches either an "a", "k", or "m". A range of characters can be indicated with a dash, as in [a-z], which matches any lower-case letter. However, if the first character of the set is the caret (^), then the RE matches any character except those in the set. It does not match the empty string. For example: [^akm] matches any character except "a", "k", or "m". The caret loses its special meaning if it is not the first character of the set.

To match a multicharacter RE

  1. Parentheses (( )) group parts of regular expressions together into subexpressions that can be treated as a single unit. For example, (ha)+ matches one or more "ha"s.
  2. An asterisk (*) following a one-character RE or a parenthesized subexpression matches zero or more occurrences of the RE. Hence, [a-z]* and (ha)* match zero or more lower-case characters.
  3. A plus (+) following a one-character RE or a parenthesized subexpression matches one or more occurrences of the RE. Hence, [a-z]+ and (ha)+ match one or more lower-case characters.
  4. A question mark (?) is an optional element. The preceding RE can occur zero or once in the string – no more. For example, xy?z matches either xyz or xz.
  5. The concatenation of REs is a RE that matches the corresponding concatenation of strings. For example, [A-Z][a-z]* matches matches any capitalized word.
  6. The OR character ( | ) allows a choice between two regular expressions. For example, jell(y|ies) matches either "jelly" or "jellies".
  7. Braces ({ }) following a one-character RE matches the preceding element according to the number indicated. For example, a{2,3} matches either "aa" or "aaa."

All or part of the regular expression can be "anchored" to either the beginning or end of the string being searched.

  1. If the caret (^) is at the beginning of the (sub)expression, then the matched string must be at the beginning of the string being searched. For example, you could use "t^hat" to return all occurrences of "hat" but avoid returning "that".
  2. If the dollar sign ($) is at the end of the (sub)expression, then the matched string must be at the end of the string being searched. For example, "know$" would match "I know what I know" but not "He knows what he knows."

Overriding the backslash special character

A common pitfall with regular expression classes is overriding the backslash special character (\). The C++ compiler and the regular expression constructor will both assume that any backslashes they see are intended to escape the following character. Thus, to specify a regular expression that exactly matches "a\a", create the regular expression using four backslashes as follows: the regular expression needs to see "a\\a", and for that to happen, the compiler would have to see "a\\\\a".

RWTRegex<char> reg("a\\\\a");
^|^|
1 2

The backslashes marked with a ^ are an escape for the compiler, and the ones marked with | will thus be seen by the regular expression parser. At that point, the backslash marked 1 is an escape, and the one marked 2 will actually be put into the regular expression.

Similarly, if you really need to escape a character, such as a '.', you will have to pass two backslashes to the compiler:

RWTRegex<char> regDot("\\.");
^|

Once again, the backslash marked ^ is an escape for the compiler, and the one marked with | will be seen by the regular expression constructor as an escape for the following '.' .

Synopsis
#include <rw/tools/regex.h>
RWTRegex<char> re0(".*\\.doc");
// Matches filenames with suffix ".doc"
RWCRegularExpression re1("a+");
// Matches one or more 'a'
RWWRegularExpression re2(L"b+");
// Matches one or more wide-character, 'b'
See also

Related classes include:

  • RWTRegexMatchIterator which iterates over matches of a pattern in a given string.
  • RWTRegexResult which encapsulates the results of a pattern matching operation.
  • RWTRegexTraits which defines the character traits for a specific type of regular expression character and includes methods for returning these values.
  • RWRegexErr which reports errors from within RWTRegex.
Persistence
None
Example
#include <rw/tools/regex.h>
#include <rw/cstring.h>
#include <iostream>
using std::cout;
using std::endl;
int main()
{
RWCString aString("Hark! Hark! The lark");
// This regular expression matches any lowercase word
// or end of a word starting with "l"
RWTRegex<char> re("l[a-z]*");
if (result = re.search(aString))
cout << result.subString(aString) << endl; //Prints "lark"
return 0;
}

Program output:

lark

Member Typedef Documentation

template<class T>
typedef RWTRegexMatchIterator<T> RWTRegex< T >::iterator

Typedef based on the character type used to instantiate RWTRegex. For example, for RWTRegex<char>::iterator is a typedef for RWTRegexMatchIterator<char>.

Note
RWTRegex::match_iterator and RWTRegex::iterator are provided. RWTRegex::iterator is a match iterator. If you need to add new iterator types, you must give them a descriptive prefix, as in RWTRegex::match_iterator.
template<class T>
typedef RWTRegexMatchIterator<T> RWTRegex< T >::match_iterator

Typedef based on the character type used to instantiate RWTRegex.

template<class T>
typedef RWTRegexTraits<T>::Char RWTRegex< T >::RChar

Typedef for the character type.

template<class T>
typedef std::basic_string<RChar> RWTRegex< T >::RString

Typedef for a string type to be used with RWTRegex.

Member Enumeration Documentation

template<class T>
enum RWTRegex::RWTRegexStatus

Defines allowable status codes. These codes are accessed by RWRegexErr.

Enumerator
Ok 

 

MissingEscapeSequence 

 

InvalidHexNibble 

 

InsufficientHex8Data 

 

InsufficientHex16Data 

 

MissingClosingBracket 

 

MissingClosingCurlyBrace 

 

MissingClosingParen 

 

UnmatchedClosingParen 

 

InvalidSubexpression 

 

InvalidDataAfterOr 

 

InvalidDataBeforeOr 

 

ConsecutiveCardinalities 

 

InvalidCardinalityRange 

 

LeadingCardinality 

 

InvalidDecimalDigit 

 

UnmatchedClosingCurly 

 

NumberOfStatusCodes 

 

Constructor & Destructor Documentation

template<class T>
RWTRegex< T >::RWTRegex ( )

Default constructor. Objects initialized with this constructor represent uninitialized patterns. These objects should be assigned a valid pattern before use.

template<class T>
RWTRegex< T >::RWTRegex ( const RChar str,
size_t  length = size_t(-1) 
)

Initializes an RWTRegex object to represent the pattern specified in the str parameter.

The parameter str specifies the pattern string for the regular expression.

The parameter length specifies the length, in characters, of the pattern string. If the length is not specified, then it is calculated as the number of characters preceding the first occurrence of a NULL character, according to its character traits. (The traits for each type of character are defined in RWTRegexTraits.)

Exceptions
RWRegexErrThrown if a pattern error is encountered.
template<class T>
RWTRegex< T >::RWTRegex ( const RString str,
size_t  length = size_t(-1) 
)

Initializes an RWTRegex object to represent the pattern specified in str.

Parameters
strThe pattern string for the RE.
lengthThe length, in characters, of the pattern string. If length is not specified, the length of str is used.
Exceptions
RWRegexErrThrown if a pattern error is encountered.
template<class T>
RWTRegex< T >::RWTRegex ( const RWTRegex< T > &  source)

Copy constructor. The pattern represented by the source RWTRegex object is copied to this RWTRegex object. This copying operation is performed without recompiling the original pattern.

template<class T>
RWTRegex< T >::RWTRegex ( RWTRegex< T > &&  rhs)

Move constructor. The constructed instance takes ownership of the data owned by rhs.

Condition:
This method is available only on platforms with rvalue reference support.
template<class T>
virtual RWTRegex< T >::~RWTRegex ( )
virtual

Destructor. Releases any allocated memory.

Member Function Documentation

template<class T>
const RWRegexErr& RWTRegex< T >::getStatus ( ) const

Returns the regular expression status for the last-pattern compilation status. This method is useful primarily in exception-disabled environments in which the default error handler for the Essential Tools Module error framework has been replaced with a function that does not abort. Otherwise, the regular expression object will not be available for this query.

template<class T>
size_t RWTRegex< T >::index ( const RChar str,
size_t *  mLen = 0,
size_t  start = size_t(0),
size_t  length = size_t(-1) 
)

Searches an input string for the first occurrence of a match for this regular expression pattern. The search begins with the character at a specified start character position in the supplied input string. It continues, one character at a time, until either a match is found, or the end of the string is reached. Use length to specify the length of the input string.

  • If a match is found, returns the index into the string at which the first match was found, starting from the beginning of the string. The length of the match is returned in the mLen argument.
  • If no match is found, returns RW_NPOS.
Parameters
strThe string to be searched for a match.
mLenA return parameter representing the length of any match found during this operation. If not supplied, (NULL), the length is not returned, but is available through RWTRegexResult<T>::getLength().
startThe character position where the search for a match will start.
lengthThe length, in characters, of the entire input string. If the length is not specified, then it is calculated as the number of characters preceding the first occurrence of a NULL character, as defined by the traits specific to this type of character.

Returns the starting character position, from the beginning of the string, of a match. If no match is found, RW_NPOS is returned.

template<class T>
size_t RWTRegex< T >::index ( const RString str,
size_t *  mLen = 0,
size_t  start = size_t(0),
size_t  length = size_t(-1) 
)

Searches an input string for the first occurrence of a match for this regular expression pattern. The search begins with the character at a specified start character position in the supplied input string, and continues, one character at a time, until either a match is found, or the end of the string is reached. Use length to specify the length of the input string.

  • If a match is found, the method returns the index into the string at which the first match was found, starting from the beginning of the string. The length of the match is returned in the mLen argument.
  • If no match is found, the method returns RW_NPOS.
Parameters
strThe string to be searched for a match.
mLenA return parameter representing the length of any match found during this operation. If not supplied, (NULL), the length is not returned, but is available through RWTRegexResult<T>::getLength().
startThe character position where the search for a match will start.
lengthThe length, in characters, of the entire input string. If the length is not specified, it is assigned the length of the input string object.
template<class T>
RWTRegexResult<T> RWTRegex< T >::matchAt ( const RChar str,
size_t  start = size_t(0),
size_t  length = size_t(-1) 
)

Searches an input string for a match against the pattern string represented by this RWTRegex object. The match must start at the specified character in the input string. (This is similar to anchoring the pattern at the beginning of the string using the circumflex character ^.)

Parameters
strThe string to be searched for a match.
startThe character position where the search for a match will start.
lengthThe length, in characters, of the entire input string. If the length is not specified, then it is calculated as the number of characters preceding the first occurrence of a NULL character, as defined by the traits specific to this type of character.
template<class T>
RWTRegexResult<T> RWTRegex< T >::matchAt ( const RString str,
size_t  start = size_t(0),
size_t  length = size_t(-1) 
)

Searches an input string for a match against the pattern string represented by this RWTRegex object. The match must start at the specified character in the input string. (This is similar to anchoring the pattern at the beginning of the string using the circumflex character ^.)

Parameters
strThe string to be searched for a match.
startThe character position where the search for a match will start.
lengthThe length, in characters, of the entire input string. If the length is not specified, it is assigned the length of the input string object.
template<class T>
bool RWTRegex< T >::operator< ( const RWTRegex< T > &  rhs) const

Compares this RWTRegex object to the rhs RWTRegex object by performing an element-by-element comparison of the characters in each object's pattern string. Character comparisons are performed as defined by the lt method on the RWTRegexTraits class implemented for the type of character in use.

This object is considered less than rhs if it contains the lesser of the first two unequal characters, from left to right, or if there are no unequal characters, but this pattern string is shorter than rhs, i.e. this pattern string has fewer characters.

Returns true if this RWTRegex is less rhs.

template<class T>
RWTRegex& RWTRegex< T >::operator= ( const RWTRegex< T > &  rhs)

Assignment operator. Copies the RWTRegex object specified by rhs into this RWTRegex object. The copy is performed without recompiling the original pattern. Returns a reference to this newly assigned RWTRegex object.

template<class T>
RWTRegex& RWTRegex< T >::operator= ( RWTRegex< T > &&  rhs)

Move assignment. Self takes ownership of the data owned by rhs.

Condition:
This method is available only on platforms with rvalue reference support.
template<class T>
bool RWTRegex< T >::operator== ( const RWTRegex< T > &  rhs) const

Compares this RWTRegex object to the rhs RWTRegex object by performing an element-by-element comparison of the characters in each object's pattern string. Character comparisons are performed as defined by the eq method on the RWTRegexTraits class implemented for the type of character in use.

This object is considered equal to rhs if it contains the same number of characters, and each corresponding pair of characters in the patterns are equal to one another.

Returns true if this RWTRegex is equal to rhs.

template<class T>
size_t RWTRegex< T >::replace ( RString str,
const RString replacement,
size_t  count = 1,
size_t  matchID = 0,
size_t  start = size_t(0),
size_t  length = size_t(-1),
bool  replaceEmptyMatches = true 
)

Replaces occurrences of the regular expression pattern in str with a replacement string, replacement. The number of replacements is identified by count. The default value for count is 1, which replaces only the first occurrence of the pattern.

Zero-length matches are replaced only if replaceEmptyMatches is true. The search begins at the start character position. The length, in characters, of the original string is identified by length. The input str is updated as part of this operation.

Returns the total number of occurrences replaced.

Parameters
strThe string to be searched for a match.
replacementThe string to replace all occurrences of the pattern in str.
countThe number of matches to replace. If 0 is specified, all matches are replaced.
matchIDThe match identifier of the sub-expression to be replaced. The default value of 0 replaces the overall match with specified replacement text.
startThe character position where the search for a match will start.
lengthThe length, in characters, of the entire input string. If the length is not specified, it is assigned the length of the input string object.
replaceEmptyMatchesBoolean. If true, zero-length matches are replaced, as well as all other matches. Otherwise, only matches with length greater than zero are replaced.

Returns the starting character position, from the beginning of the string, of a match. If no match is found, RW_NPOS is returned.

template<class T>
RWTRegexResult<T> RWTRegex< T >::search ( const RChar str,
size_t  start = size_t(0),
size_t  length = size_t(-1) 
)

Searches an input string for the first occurrence of a match for this RE pattern. The search begins with the character at a specified start character position in the supplied input string, and continues, one character at a time, until either a match is found, or until the end of the string is reached.

Parameters
strThe string to be searched for a match.
startThe character position where the search for a match will start.
lengthThe length, in characters, of the entire input string. If the length is not specified, it is calculated as the number of characters preceding the firs occurrence of a NULL character, as defined by this character's traits.
template<class T>
RWTRegexResult<T> RWTRegex< T >::search ( const RString str,
size_t  start = size_t(0),
size_t  length = size_t(-1) 
)

Searches an input string for the first occurrence of a match for this regular expression pattern. The search begins with the character at a specified start character position in the supplied input string, and continues, one character at a time, until either a match is found, or until the end of the string is reached.

Parameters
strThe string to be searched for a match.
startThe character position where the search for a match will start.
lengthThe length, in characters, of the entire input string. If the length is not specified, then it is assigned the length of the input string object.
template<class T>
size_t RWTRegex< T >::subCount ( ) const

Returns the number of parenthesized subexpressions in this regular expression.

template<class T>
void RWTRegex< T >::swap ( RWTRegex< T > &  rhs)

Swaps the data owned by self with the data owned by rhs.

Copyright © 2022 Rogue Wave Software, Inc., a Perforce company. All Rights Reserved.