Searching Operations

The next category of algorithms we will describe are those that are used to locate elements within a sequence that satisfy certain properties. Most commonly the result of a search is then used as an argument to a further operation, such as a copy (Chapter 13: Partition a Sequence into Two Groups), a partition (Chapter 13: Copy One Sequence into Another Sequence) or an in-place merge (Chapter 13: Merge two Adjacent Sequences into One)

Obtaining the Source

The searching routines described in this section return an iterator that identifies the first element that satisfies the search condition. It is common to store this value in an iterator variable, as follows:

   list<int>::iterator where;
    where = find(aList.begin(), aList.end(), 7);

If you want to locate all the elements that satisfy the search conditions you must write a loop. In that loop, the value yielded by a previous search is first advanced (since otherwise the value yielded by the previous search would once again be returned), and the resulting value is used as a starting point for the new search. For example, the following loop from the adjacent_find() example program (Chapter 13: Find Consecutive Duplicate Elements) will print the value of all repeated characters in a string argument.

Check Search Results

   while ((where = adjacent_find(where, stop)) != stop) {
       cout << "double " << *where << " in position " 
          << where - start << endl;
       ++where;
       }

Many of the searching algorithms have an optional argument that can specify a function to be used to compare elements, in place of the equality operator for the container element type (operator ==). In the descriptions of the algorithms we write these optional arguments inside a square bracket, to indicate they need not be specified if the standard equality operator is acceptable.

Find an Element Satisfying a Condition

There are two algorithms, find() and find_if(), that are used to find the first element that satisfies a condition. The declarations of these two algorithms are as follows:

InputIterator find_if (InputIterator first, InputIterator last, 
       Predicate);
 
 InputIterator find (InputIterator first, InputIterator last, 
       const T&);

The algorithm find_if() takes as argument a predicate function, which can be any function that returns a boolean value (see Chapter 3). The find_if() algorithm returns a new iterator that designates the first element in the sequence that satisfies the predicate. The second argument, the past-the-end iterator, is returned if no element is found that matches the requirement. Because the resulting value is an iterator, the dereference operator (the * operator) must be used to obtain the matching value. This is illustrated in the example program.

The second form of the algorithm, find(), replaces the predicate function with a specific value, and returns the first element in the sequence that tests equal to this value, using the appropriate equality operator (the == operator) for the given data type.

Searching Sets and Maps

The following example program illustrates the use of these algorithms:

void find_test ()
    // illustrate the use of the find algorithm
 {
    int vintageYears[] = {1967, 1972, 1974, 1980, 1995};
    int * start = vintageYears;
    int * stop = start + 5;
    int * where = find_if (start, stop, isLeapYear);
 
    if (where != stop)
       cout << "first vintage leap year is " << *where << endl;
    else
       cout << "no vintage leap years" << endl;
 
    where = find(start, stop, 1995);
 
    if (where != stop)
       cout << "1995 is position " << where - start 
          << " in sequence" << endl;
    else
       cout "1995 does not occur in sequence" << endl;
 }

Find Consecutive Duplicate Elements

The adjacent_find() algorithm is used to discover the first element in a sequence equal to the next immediately following element. For example, if a sequence contained the values 1 4 2 5 6 6 7 5, the algorithm would return an iterator corresponding to the first 6 value. If no value satisfying the condition is found, then the end-of-sequence iterator is returned. The declaration of the algorithm is as follows:

ForwardIterator adjacent_find (ForwardIterator first, 
    ForwardIterator last [, BinaryPredicate ] );

The first two arguments specify the sequence to be examined. The optional third argument must be a binary predicate (a binary function returning a boolean value). If present, the binary function is used to test adjacent elements, otherwise the equality operator (operator ==) is used.

The example program searches a text string for adjacent letters. In the example text these are found in positions 5, 7, 9, 21 and 37. The increment is necessary inside the loop in order to avoid the same position being discovered repeatedly.

void adjacent_find_example ()
    // illustrate the use of the adjacent_find instruction
 {
    char * text = "The bookkeeper carefully opened the door.";

   char * start = text;
    char * stop = text + strlen(text);
    char * where = start;
 
    cout << "In the text: " << text << endl;
    while ((where = adjacent_find(where, stop)) != stop) {
       cout << "double " << *where 
          << " in position " << where - start << endl;
       ++where;
    }
 }

Find a Subsequence within a Sequence

The algorithm search() is used to locate the beginning of a particular subsequence within a larger sequence. The easiest example to understand is the problem of looking for a particular substring within a larger string, although the algorithm can be generalized to other uses. The arguments are assumed to have at least the capabilities of forward iterators.

ForwardIterator search 
    (ForwardIterator first1, ForwardIterator last1, 
    ForwardIterator first2, ForwardIterator last2 
    [, BinaryPredicate ]);

Speed of Search

Suppose, for example, that we wish to discover the location of the string "ration" in the string "dreams and aspirations". The solution to this problem is shown in the example program. If no appropriate match is found, the value returned is the past-the-end iterator for the first sequence.

void search_example ()
       // illustrate the use of the search algorithm
 {
    char * base = "dreams and aspirations";
    char * text = "ration";
 
    char * where = search(base, base + strlen(base), 
          text, text + strlen(text));
 
    if (*where != '\0')
       cout << "substring position: " << where - base << endl;
    else
       cout << "substring does not occur in text" << endl;
 }

Note that this algorithm, unlike many that manipulate two sequences, uses a starting and ending iterator pair for both sequences, not just the first sequence.

Like the algorithms equal() and mismatch(), an alternative version of search() takes an optional binary predicate that is used to compare elements from the two sequences.

Locate Maximum or Minimum Element

The functions max() and min() can be used to find the maximum and minimum of a pair of values. These can optionally take a third argument that defines the comparison function to use in place of the less-than operator (operator <). The arguments are values, not iterators:

template <class T> 
    const T& max(const T& a, const T& b [, Compare ] );
 template <class T> 
    const T& min(const T& a, const T& b [, Compare ] );

The maximum and minimum functions are generalized to entire sequences by the generic algorithms max_element() and min_element(). For these functions the arguments are input iterators.

ForwardIterator max_element (ForwardIterator first, 
       ForwardIterator last [, Compare ] );
 ForwardIterator min_element (ForwardIterator first, 
       ForwardIterator last [, Compare ] );

Largest and Smallest Elements of a Set

These algorithms return an iterator that denotes the largest or smallest of the values in a sequence, respectively. Should more than one value satisfy the requirement, the result yielded is the first satisfactory value. Both algorithms can optionally take a third argument, which is the function to be used as the comparison operator in place of the default operator.

The example program illustrates several uses of these algorithms. The function named split() used to divide a string into words in the string example is described in Chapter 19 (Example Program of auto_ptr). The function randomInteger() is described in Chapter 2 (Random Access Iterators).

void max_min_example ()
    // illustrate use of max_element and min_element algorithms
 {
    // make a vector of random numbers between 0 and 99
    vector<int> numbers(25);
    for (int i = 0; i < 25; i++)
       numbers[i] = randomInteger(100);
 
    // print the maximum
    vector<int>::iterator max = 
       max_element(numbers.begin(), numbers.end());
    cout << "largest value was " << * max << endl;
 
    // example using strings
    string text = 
       "It was the best of times, it was the worst of times.";
    list<string> words;
    split (text, " .,!:;", words);
    cout << "The smallest word is " 
          << * min_element(words.begin(), words.end())
          << " and the largest word is "
          << * max_element(words.begin(), words.end())
          << endl;
 }

Locate the First Mismatched Elements in Parallel Sequences

The name mismatch() might lead you to think this algorithm was the inverse of the equal() algorithm, which determines if two sequences are equal (see Chapter 13: Generalized Inner Product). Instead, the mismatch() algorithm returns a pair of iterators that together indicate the first positions where two parallel sequences have differing elements. (The structure pair is described in Chapter 9: The Map Data Abstraction). The second sequence is denoted only by a starting position, without an ending position. It is assumed (but not checked) that the second sequence contains at least as many elements as the first. The arguments and return type for mismatch() can be described as follows:

pair<InputIterator, InputIterator> mismatch 
    (InputIterator first1, InputIterator last1, 
       InputIterator first2 [, BinaryPredicate ] );

The elements of the two sequences are examined in parallel, element by element. When a mismatch is found, that is, a point where the two sequences differ, then a pair containing iterators denoting the locations of the two differing elements is constructed and returned. If the first sequence becomes exhausted before discovering any mismatched elements, then the resulting pair contains the ending value for the first sequence, and the last value examined in the second sequence. (The second sequence need not yet be exhausted).

The example program illustrates the use of this procedure. The function mismatch_test() takes as arguments two string values. These are lexicographically compared and a message printed indicating their relative ordering. (This is similar to the analysis performed by the lexicographic_compare() algorithm, although that function simply returns a boolean value.) Because the mismatch() algorithm assumes the second sequence is at least as long as the first, a comparison of the two string lengths is performed first, and the arguments are reversed if the second string is shorter than the first. After the call on mismatch() the elements of the resulting pair are separated into their component parts. These parts are then tested to determine the appropriate ordering.

   void mismatch_test (char * a, char * b)
           // illustrate the use of the mismatch algorithm
    {
       pair<char *, char *> differPositions(0, 0);
       char * aDiffPosition;
       char * bDiffPosition;

      if (strlen(a) < strlen(b)) {
          // make sure longer string is second
          differPositions = mismatch(a, a + strlen(a), b);
          aDiffPosition = differPositions.first;
          bDiffPosition = differPositions.second;
          }
       else {
          differPositions = mismatch(b, b + strlen(b), a);
          // note following reverse ordering
          aDiffPosition = differPositions.second;
          bDiffPosition = differPositions.first;
          }

      // compare resulting values
       cout << "string " << a;
       if (*aDiffPosition == *bDiffPosition)
          cout << " is equal to ";
       else if (*aDiffPosition < *bDiffPosition)
          cout << " is less than ";
       else
          cout << " is greater than ";
       cout << b << endl;
    }

A second form of the mismatch() algorithm is similar to the one illustrated, except it accepts a binary predicate as a fourth argument. This binary function is used to compare elements, in place of the == operator.