Deprecated. Represents a regular expression. More...

#include <rw/regexp.h>

Public Types
enum	statVal { OK , ILLEGAL , NOMEM , TOOLONG }

Public Member Functions
	RWCRegexp (const char *pat)

	RWCRegexp (const RWCRegexp &r)

	RWCRegexp (const RWCString &pat)

	~RWCRegexp ()

size_t	index (const RWCString &str, size_t *len, size_t start=0) const

RWCRegexp &	operator= (const char *)

RWCRegexp &	operator= (const RWCRegexp &r)

RWCRegexp &	operator= (const RWCString &pat)

statVal	status ()

Detailed Description

Deprecated: As of SourcePro 4, use RWTRegex instead.

Class RWCRegexp represents a regular expression. The constructor "compiles" the expression into a form that can be used more efficiently. The results can then be used for string searches using class RWCString.

The regular expression (RE) is constructed as follows:

The following rules determine one-character REs that match a single character:

Any character that is not a special character (to be defined)
A backslash (\) followed by any special character matches the literal character itself. i.e., this "escapes" the special character.

Note
There is one exception to this rule. \^char is interpreted as a control character: thus \^R is control-R. To match the circumflex ^ itself, use \x5e in US-ASCII environments.
The "special characters" are:
+ * ? . [ ] ^ $

The period (.) matches any character except the newline.
A set of characters enclosed in brackets ([]) is a one-character RE that matches any of the characters in that set. Example: "[akm]" matches either an "a", "k", or "m". A range of characters can be indicated with a dash. Example: "[a-z]" matches any lower-case letter. However, if the first character of the set is the caret (^), then the RE matches any character except those in the set. It does not match the empty string. Example: [^akm] matches any character except "a", "k", or "m". The caret loses its special meaning if it is not the first character of the set.

The following rules can be used to build a multi-character RE.

A one-character RE followed by an asterisk (*) matches zero or more occurrences of the RE. Hence, [a-z]* matches zero or more lower-case characters.
A one-character RE followed by a plus (+) matches one or more occurrences of the RE. Hence, [a-z]+ matches one or more lower-case characters.
A question mark (?) is an optional element. The preceding RE can occur zero or once in the string – no more. For example, xy?z matches either xyz or xz.
The concatenation of REs is an RE that matches the corresponding concatenation of strings. For example, [A-Z][a-z]* matches any capitalized word.

Finally, the entire regular expression can be anchored to match only the beginning or end of a line:

If the caret (^) is at the beginning of the RE, then the matched string must be at the beginning of the line.
If the dollar sign ($) is at the end of the RE, then the matched string must be at the end of the line.

The following escape codes can be used to match control characters:

`\b`	backspace
`\e`	`ESC` (escape)
`\f`	formfeed
`\n`	newline
`\r`	carriage return
`\t`	tab
`\xddd`	the literal hex number `0xdd`
`\ddd`	the literal octal number `ddd`
`\^C`	Control code. For example, \c ^D is "control-D"

The most frequent problem with using this class is specifying a backslash character to be parsed. If you are attempting to parse a regular expression that contains backslashes, you must be aware that the C++ compiler and the regular expression constructor both assume that any backslashes are intended to escape the following character. Thus, to specify a regular expression that exactly matches "a\a", you would have to create the regular expression using four backslashes as follows: the regular expression needs to see "a\\a", and for that to happen, the compiler would have to see "a\\\\a".

RWCRegexp reg("a\\\\a");
                ^|^|
                 1 2

The backslashes marked with a ^ are an escape for the compiler, and the ones marked with | are seen by the regular expression parser. At that point, the backslash marked 1 is an escape, and the one marked 2 is actually put into the regular expression.

Similarly, if you really need to escape a character, such as a '.' you have to pass two backslashes to the compiler:

RWCRegexp regDot("\\.")

^|

Once again, the backslash marked ^ is an escape for the compiler, and the one marked with | is seen by the regular expression constructor as an escape for the following '.'.

Synopsis: #include <rw/regexp.h>

// Matches filename with suffix ".doc"

RWCRegexp re(".*\\.doc");

RWCRegexp
Deprecated. Represents a regular expression.
Definition regexp.h:216

Persistence: None

Example: #include <iostream>

#include <rw/cstring.h>

#include <rw/regexp.h>

int main() {

RWCString s("Hark! Hark! the lark");

std::cout << "Searching for expressions beginning with \"l\" in \""

<< s << "\".\n";

// A regular expression matching any lower-case word

// starting with 'l':

RWCRegexp reg("l[a-z]*");

std::cout << "Found \"" << s(reg) << "\"." << std::endl;

return 0;

}

RWCString
Offers powerful and convenient facilities for manipulating strings.
Definition stdcstring.h:826

Program output:

Searching for expressions beginning with "l" in "Hark! Hark! the lark".

Found "lark".

Member Enumeration Documentation

◆ statVal

enum RWCRegexp::statVal

This enumeration represents the status of the regular expression encapsulated by a RWCRegexp instance.

Enumerator
OK	No errors.
ILLEGAL	Pattern was illegal.
NOMEM	Memory could not be allocated.
TOOLONG	Pattern exceeded maximum length. (To change the amount of space allocated for a pattern, edit `regexp.cpp` to change the value of `RWCRegexp::maxpat_`, then recompile and insert the changed object into the appropriate library.)

Constructor & Destructor Documentation

◆ RWCRegexp() [1/3]

RWCRegexp::RWCRegexp ( const char * pat )

Constructs a regular expression from the pattern given by pat. The status of the results can be found by using member function status().

◆ RWCRegexp() [2/3]

RWCRegexp::RWCRegexp ( const RWCString & pat )

Constructs a regular expression from the pattern given by pat. The status of the results can be found by using member function status().

◆ RWCRegexp() [3/3]

RWCRegexp::RWCRegexp ( const RWCRegexp & r )

Copy constructor. Uses value semantics – self is a copy of r.

◆ ~RWCRegexp()

RWCRegexp::~RWCRegexp ( )

Destructor. Releases any allocated memory.

Member Function Documentation

◆ index()

size_t RWCRegexp::index	(	const RWCString &	str,
		size_t *	len,
		size_t	start = 0 ) const

Returns the index of the first instance in the string str that matches the regular expression compiled in self, or RW_NPOS if there is no such match. The search starts at index start. The length of the matching pattern is returned in the variable pointed to by len. Using an invalid regular expression for the search throws an exception of type RWInternalErr.

Note: This member function is relatively clumsy to use – class RWCString offers a better interface to regular expression searches.

◆ operator=() [1/3]

RWCRegexp & RWCRegexp::operator= ( const char * )

Recompiles self to the pattern given by pat. The status of the results can be found by using member function status().

◆ operator=() [2/3]

RWCRegexp & RWCRegexp::operator= ( const RWCRegexp & r )

Uses value semantics – sets self to a copy of r.

◆ operator=() [3/3]

RWCRegexp & RWCRegexp::operator= ( const RWCString & pat )