SourcePro® API Reference Guide

 
List of all members | Public Member Functions | Related Functions
RWTRegularExpression< charT > Class Template Reference

Deprecated. Provides extended regular expression matching similar to that found in lex and awk. More...

#include <rw/re.h>

Inherits RWREBaseClass.

Public Member Functions

 RWTRegularExpression ()
 
 RWTRegularExpression (const RWTRegularExpression< charT > &other)
 
virtual ~RWTRegularExpression ()
 
size_t index (const stringT &str, size_t *len=NULL, size_t start=0)
 
RWTRegularExpression< charT > & operator= (const RWTRegularExpression< charT > &other)
 
statusType status () const
 
 RWTRegularExpression (const charT *pat)
 
 RWTRegularExpression (const stringT &pat)
 
RWTRegularExpression< charT > & operator= (const charT *pat)
 
RWTRegularExpression< charT > & operator= (const stringT &pat)
 

Related Functions

(Note that these are not member functions.)

typedef RWTRegularExpression< char > RWCRExpr
 

Detailed Description

template<class charT>
class RWTRegularExpression< charT >

Deprecated:
As of SourcePro 4, use RWTRegex instead.

Class RWTRegularExpression represents an extended regular expression such as those found in lex and awk. The constructor "compiles" the expression into a form that can be used more efficiently. The results can then be used for string searches using class RWCString. Regular expressions can be of arbitrary size, limited by memory. The extended regular expression features found here are a subset of those found in the POSIX.2 standard (ANSI/IEEE Std 1003.2, ISO/IEC 9945-2).

The regular expression (RE) is constructed as follows:

The following rules determine one-character REs that match a single character:

Any character that is not a special character (to be defined) matches itself.

  1. A backslash (\) followed by any special character matches the literal character itself; that is, its use "escapes" the special character. For example, \* matches "*" without applying the syntax of the * special character.
  2. The "special characters" are:
    + * ? . [ ] ^ $ ( ) { } | \
  3. The period (.) matches any character. For example, ".umpty" matches either "Humpty" or "Dumpty".
  4. A set of characters enclosed in brackets ([ ]) is a one-character RE that matches any of the characters in that set. This means that [akm] matches either an "a", "k", or "m". A range of characters can be indicated with a dash, as in [a-z], which matches any lower-case letter. However, if the first character of the set is the caret (^), then the RE matches any character except those in the set. It does not match the empty string. For example: [^akm] matches any character except "a", "k", or "m". The caret loses its special meaning if it is not the first character of the set.

The following rules can be used to build a multicharacter RE:

  1. Parentheses (( )) group parts of regular expressions together into subexpressions that can be treated as a single unit. For
  2. A one-character RE followed by an asterisk (*) matches zero or more occurrences of the RE. Hence, [a-z]* matches zero
  3. A one-character RE followed by a plus (+) matches one or more occurrences of the RE. Hence, [a-z]+ matches one or more
  4. A question mark (?) is an optional element. The preceding RE can occur zero or once in the string – no more. For example,
  5. The concatenation of REs is a RE that matches the corresponding concatenation of strings. For example, [A-Z][a-z]* matches
  6. The OR character ( | ) allows a choice between two regular expressions. For example, jell(y|ies) matches either "jelly"
  7. Braces ({ }) are reserved for future use.

All or part of the regular expression can be "anchored" to either the beginning or end of the string being searched:

  1. If the caret (^) is at the beginning of the (sub)expression, then the matched string must be at the beginning of the string
  2. If the dollar sign ($) is at the end of the (sub)expression, then the matched string must be at the end of the string being searched.

The most frequent problem with use of this class is in being able to specify a backslash character to be parsed. If you are attempting to parse a regular expression that contains backslashes, you must be aware that the C++ compiler and the regular expression constructor will both assume that any backslashes they see are intended to escape the following character. Thus, to specify a regular expression that exactly matches "a\a", you would have to create the regular expression using four backslashes as follows: the regular expression needs to see "a\\a", and for that to happen, the compiler would have to see "a\\\\a".

^|^|
1 2

The backslashes marked with a ^ are an escape for the compiler, and the ones marked with | will thus be seen by the regular expression parser. At that point, the backslash marked 1 is an escape, and the one marked 2 will actually be put into the regular expression.

Similarly, if you really need to escape a character, such as a '.', you will have to pass two backslashes to the compiler:

^|

Once again, the backslash marked ^ is an escape for the compiler, and the one marked with | will be seen by the regular expression constructor as an escape for the following '.'.

Synopsis
#include <rw/re.h>
RWTRegularExpression<char> re(".*\\.doc$"); // Matches filename with suffix ".doc"
Persistence
None
Example
#include <iostream>
#include <rw/re.h>
#include <rw/cstring.h>
int main ()
{
RWCString s ("Hark! Hark! the lark");
std::cout << "Searching for an expression beginning with \"l\" in \""
<< s << "\".\n";
// A regular expression matching any lower-case word
// starting with 'l':
// Prints 'lark'
std::cout << "Found \"" << s.match(reg) << "\"." << std::endl;
return 0;
}

Constructor & Destructor Documentation

template<class charT>
RWTRegularExpression< charT >::RWTRegularExpression ( )
inline

Default constructor. You must assign a pattern to the regular expression before you use it.

template<class charT>
RWTRegularExpression< charT >::RWTRegularExpression ( const charT *  pat)
inline

Construct a regular expression from the pattern given by pat. The status of the results can be found by using member function status().

template<class charT>
RWTRegularExpression< charT >::RWTRegularExpression ( const stringT &  pat)
inline

Construct a regular expression from the pattern given by pat. The status of the results can be found by using member function status().

template<class charT>
RWTRegularExpression< charT >::RWTRegularExpression ( const RWTRegularExpression< charT > &  other)
inline

Copy constructor. Uses value semantics – self will be a copy of other.

template<class charT>
virtual RWTRegularExpression< charT >::~RWTRegularExpression ( )
inlinevirtual

Destructor. Releases any allocated memory.

Member Function Documentation

template<class charT>
size_t RWTRegularExpression< charT >::index ( const stringT &  str,
size_t *  len = NULL,
size_t  start = 0 
)
inline

Returns the index of the first instance in the string str that matches the regular expression compiled in self, or RW_NPOS if there is no such match. The search starts at index start. The length of the matching pattern is returned in the variable pointed to by len. If an invalid regular expression is used for the search, an exception of type RWInternalErr will be thrown. Note that this member function is relatively clumsy to use – class RWCString offers a better interface to regular expression searches.

template<class charT>
RWTRegularExpression<charT>& RWTRegularExpression< charT >::operator= ( const RWTRegularExpression< charT > &  other)
inline

Recompiles self to pattern found in other.

template<class charT>
RWTRegularExpression<charT>& RWTRegularExpression< charT >::operator= ( const charT *  pat)
inline

Recompiles self to the pattern given by pat. The status of the results can be found by using member function status().

template<class charT>
RWTRegularExpression<charT>& RWTRegularExpression< charT >::operator= ( const stringT &  pat)
inline

Recompiles self to the pattern given by pat. The status of the results can be found by using member function status().

template<class charT>
statusType RWTRegularExpression< charT >::status ( ) const
inline

Returns the status of the regular expression:

statusType Meaning
RWTRegularExpression::OK No errors
RWTRegularExpression::NOT_SUPPORTED POSIX.2 feature not yet supported.
RWTRegularExpression::NO_MATCH Tried to find a match but failed
RWTRegularExpression::BAD_PATTERN Pattern was illegal
RWTRegularExpression::BAD_COLLATING_ELEMENT Invalid collating element referenced
RWTRegularExpression::BAD_CHAR_CLASS_TYPE Invalid character class type referenced
RWTRegularExpression::TRAILING_BACKSLASH Trailing \ in pattern
RWTRegularExpression::UNMATCHED_BRACKET [] imbalance
RWTRegularExpression::UNMATCHED_PARENTHESIS () imbalance
RWTRegularExpression::UNMATCHED_BRACE {} imbalance
RWTRegularExpression::BAD_BRACE Content of {} invalid.
RWTRegularExpression::BAD_CHAR_RANGE Invalid endpoint in [a-z] expression
RWTRegularExpression::OUT_OF_MEMORY Out of memory
RWTRegularExpression::BAD_REPEAT ?,* or + not preceded by valid regular expression

Copyright © 2023 Rogue Wave Software, Inc., a Perforce company. All Rights Reserved.