Rogue Wave banner
Previous fileTop of DocumentContentsIndexNext file

5.6 Using the Model Selection Classes

As described in Section 4.2, the model selection tools available in Analytics.h++ include forward, backward, stepwise, and exhaustive selection for both linear and logistic regression models. The class RWLinRegModelSelector<F> provides the four model selection techniques for linear regression models, and the class RWLogRegModelSelector<F> provides them for logistic regression models. The interfaces to both classes are nearly identical; the only difference is that one class is specialized for linear regression models, and the other is specialized for logistic regression models.

5.6.1 Selection Evaluation Criteria: Function Objects

If we interpret model selection as search, we can interpret the template parameter F as the search evaluation criterion. A choice for F should denote a function object that returns a numerical value whenever it is given a subset of predictor variables.

For the linear model selection class RWLinRegModelSelector<F>,
Analytics.h++ supplies the function object class RWLinRegressFStatistic. It returns the F statistic value for the given predictor variable subset. (See Section 3.2.4.) If a different evaluation criterion is desired, refer to the discussion on writing your own function objects in Section 5.6.3.

For the class RWLogRegModelSelector<F>, Analytics.h++ supplies the logistic regression function object class RWGPValueFunctor. It returns the p-value of the predictor variable subset based on the G statistic. (See Section 3.3.3.1). Again, if a different evaluation criterion is desired, refer below to the discussion on writing your own function objects.

5.6.2 A Detailed Example

The following example shows how to use the model selection class RWLinRegModelSelector<F> on a linear regression problem. We begin with a double-precision predictor matrix called predictorData and a double-precision observation vector called observationData. The next three lines create a linear regression model and a model selector object set to use forward selection with the F statistic as the subset evaluation criterion.

The next few lines examine the results of forward selection and print out key diagnostics. These diagnostics include a bit vector showing which predictor variables were selected, the parameter values associated with the selected predictor variables, and the evaluation criterion given to the best subset found using forward selection.

Now we switch to stepwise selection and see if a better subset is found.

5.6.3 Writing Your Own Function Objects

If we interpret model selection as search, we can interpret the template parameter F as the search evaluation criterion. When we instantiate the model selection classes, the choice for F should denote a function object that returns a numerical value whenever it is given a subset of predictor variables. For the class RWLinRegModelSelector<F>, the function object F should define the operator() method taking a matrix composed of some number of columns from a regression matrix, an observation vector, and vector of calculated parameters. It should also define a default constructor from which valid function objects can be created.

The implementation of operator() should expect:

For the class RWLinRegModelSelector<F>, the function object F should have the same interface as above, except that operator() takes an observation vector consisting of Boolean elements, rather than double-precision elements.

The following example shows how you might implement a function object to produce Mallow's statistic for subsets in linear regression. Note that you must take extra steps to run the example in a multithreaded environment, since the static variable fullModelMSE is not currently thread-safe.



Previous fileTop of DocumentContentsIndexNext file

©Copyright 1999, Rogue Wave Software, Inc.
Contact Rogue Wave about documentation or support issues.