Writing Your Own Function Objects
If we interpret model selection as search, we can interpret the template parameter F as the search evaluation criterion. When we instantiate the model selection classes, the choice for F should denote a function object that returns a numerical value whenever it is given a subset of predictor variables. For the class
RWLinRegModelSelector<F>, the function object F should define the
operator() method taking a matrix composed of some number of columns from a regression matrix, an observation vector, and vector of calculated parameters. It should also define a default constructor from which valid function objects can be created.
class F { public:
F();
double operator()( const RWGenMat<double>&
regressionMatrixColumns,
const RWMathVec<double>& observationVector,
const RWMathVec<double>& parameters );
};
The implementation of operator() should expect:
the number of rows in the matrix to equal the length of the observation vector
the number of columns in the matrix to equal the length of the parameter vector
the order of parameter values to correspond with the matrix row ordering.
For the class
RWLinRegModelSelector<F>, the function object F should have the same interface as above, except that
operator() takes an observation vector consisting of Boolean elements, rather than double-precision elements.
class F { public:
F();
double operator()( const RWGenMat<double>&
regressionMatrixColumns,
const RWMathVec<bool>& observationVector,
const RWMathVec<double>& parameters );
};
The following example shows how you might implement a function object to produce Mallow’s Cp statistic for subsets in linear regression. Note that you must take extra steps to run the example in a multithreaded environment, since the static variable fullModelMSE is not currently thread-safe.
#include <rw/math/genmat.h>
#include <rw/math/mathvec.h>
class RWLinMallowsCpEval {
public:
RWLinMallowsCpEval() {;}
double operator()( const RWGenMat<double>& xdata,
const RWMathVec<double>& ydata,
const RWMathVec<double>& params ) const
{
const size_t numObs = ydata.length();
const size_t numCoeffs = params.length();
RWMathVec<double> predictions = product(xdata, params);
RWMathVec<double> errors = ydata – predictions;
double SSE = 0.0;
for ( size_t i = 0; i < numObs; i++ )
SSE += errors(i)*errors(i);
return SSE / fullModelMSE + 2*numCoeffs – numObs;
}
static double fullModelMSE;};
// Static initialization is needed for some linkers.
double RWLinMallowsCpEval::fullModelMSE = 0.0;
#include <rw/analytics/linregress.h>
#include <rw/analytics/lranova.h>
#include <rw/analytics/lnrmodsel.h>
// This function (implementation not provided) reads in some data.
extern void getDataFromFile(const char* fileName,
RWGenMat<double> predMat,
RWMathVec<double> obsVec);
int main() {
RWGenMat<double> predictorMatrix;
RWMathVec<double> observationVector;
getDataFromFile(“regdata”, predictorMatrix, observationVector);
RWLinearRegression lr(predictorMatrix, observationVector);
RWLinearRegressionANOVA anova(lr);
RWLinMallowsCpEval::fullModelMSE = anova.meanSquareResidual();
RWLinRegModelSelector<RWLinMallowsCpEval> cpsel(lr,
rwForwardSelection);
cout << “Selected variable subset according to Mallows Cp: “
<< cpsel.selectedParamIndices() << endl;
return 0;
}