As described in Section 4.2, the model selection tools available in Analytics.h++ include forward, backward, stepwise, and exhaustive selection for both linear and logistic regression models. The class RWLinRegModelSelector<F> provides the four model selection techniques for linear regression models, and the class RWLogRegModelSelector<F> provides them for logistic regression models. The interfaces to both classes are nearly identical; the only difference is that one class is specialized for linear regression models, and the other is specialized for logistic regression models.
>If we interpret model selection as search, we can interpret the template parameter F as the search evaluation criterion. A choice for F should denote a function object that returns a numerical value whenever it is given a subset of predictor variables.
For the linear model selection class RWLinRegModelSelector<F>,
Analytics.h++ supplies the function object class RWLinRegressFStatistic. It returns the F statistic value for the given predictor variable subset. (See Section 3.2.4.) If a different evaluation criterion is desired, refer to the discussion on writing your own function objects in Section 5.6.3.
For the class RWLogRegModelSelector<F>, Analytics.h++ supplies the logistic regression function object class RWGPValueFunctor. It returns the p-value of the predictor variable subset based on the G statistic. (See Section 3.3.3.1). Again, if a different evaluation criterion is desired, refer below to the discussion on writing your own function objects.
>The following example shows how to use the model selection class RWLinRegModelSelector<F> on a linear regression problem. We begin with a double-precision predictor matrix called predictorData and a double-precision observation vector called observationData. The next three lines create a linear regression model and a model selector object set to use forward selection with the F statistic as the subset evaluation criterion.
// Save some typing. typedef RWLinRegModelSelector<RWLinRegressFStatistic> FStatModelSelector; RWLinearRegression lr(predictorData,observationData); FStatModelSelector selector(lr, rwForwardSelection);
The next few lines examine the results of forward selection and print out key diagnostics. These diagnostics include a bit vector showing which predictor variables were selected, the parameter values associated with the selected predictor variables, and the evaluation criterion given to the best subset found using forward selection.
if ( !selector.fail() ){ cout << "Selected variables: " << selector.selectedParamIndices(); cout << "Selected parameter values: " << selector.selectedParamValues(); cout << "Subset evaluation value: " << selector.evalFunctionForSelected(); }
Now we switch to stepwise selection and see if a better subset is found.
selector.setSearchMethod( rwStepwiseSelection ); cout << "New subset evaluation value: " << selector.evalFunctionForSelected();>
If we interpret model selection as search, we can interpret the template parameter F as the search evaluation criterion. When we instantiate the model selection classes, the choice for F should denote a function object that returns a numerical value whenever it is given a subset of predictor variables. For the class RWLinRegModelSelector<F>, the function object F should define the operator() method taking a matrix composed of some number of columns from a regression matrix, an observation vector, and vector of calculated parameters. It should also define a default constructor from which valid function objects can be created.
class F { public: F(); double operator()( const RWGenMat<double>& regressionMatrixColumns, const RWMathVec<double>& observationVector, const RWMathVec<double>& parameters ); };
The implementation of operator() should expect:
the number of rows in the matrix to equal the length of the observation vector
the number of columns in the matrix to equal the length of the parameter vector
the order of parameter values to correspond with the matrix row ordering.
For the class RWLinRegModelSelector<F>, the function object F should have the same interface as above, except that operator() takes an observation vector consisting of Boolean elements, rather than double-precision elements.
class F { public: F(); double operator()( const RWGenMat<double>& regressionMatrixColumns, const RWMathVec<RWBoolean>& observationVector, const RWMathVec<double>& parameters ); };
The following example shows how you might implement a function object to produce Mallow's statistic for subsets in linear regression. Note that you must take extra steps to run the example in a multithreaded environment, since the static variable fullModelMSE is not currently thread-safe.
#include <rw/math/genmat.h> #include <rw/math/mathvec.h> class RWLinMallowsCpEval { public: RWLinMallowsCpEval() {;} double operator()( const RWGenMat<double>& xdata, const RWMathVec<double>& ydata, const RWMathVec<double>& params ) const { const size_t numObs = ydata.length(); const size_t numCoeffs = params.length(); RWMathVec<double> predictions = product(xdata, params); RWMathVec<double> errors = ydata - predictions; double SSE = 0.0; for ( size_t i = 0; i < numObs; i++ ) SSE += errors(i)*errors(i); return SSE / fullModelMSE + 2*numCoeffs - numObs; } static double fullModelMSE;}; // Static initialization is needed for some linkers. double RWLinMallowsCpEval::fullModelMSE = 0.0; #include <rw/analytics/linregress.h> #include <rw/analytics/lranova.h> #include <rw/analytics/lnrmodsel.h> // This function (implementation not provided) reads in some data. extern void getDataFromFile(const char* fileName, RWGenMat<double> predMat, RWMathVec<double> obsVec); main() { RWGenMat<double> predictorMatrix; RWMathVec<double> observationVector; getDataFromFile("regdata", predictorMatrix, observationVector); RWLinearRegression lr(predictorMatrix, observationVector); RWLinearRegressionANOVA anova(lr); RWLinMallowsCpEval::fullModelMSE = anova.meanSquareResidual(); RWLinRegModelSelector<RWLinMallowsCpEval> cpsel(lr, rwForwardSelection); cout << "Selected variable subset according to Mallows Cp: " << cpsel.selectedParamIndices() << endl; return 0; }>
©Copyright 1999, Rogue Wave Software, Inc.
Contact Rogue Wave about documentation or support issues.