Calculation Methods for Linear Regression
Given the linear regression model Y = bx + e, finding the least squares solution
is equivalent to solving the normal equations
. Thus the solution for
is given by:
The Business Analysis Module includes three classes for calculating multiple linear regression parameters: RWLeastSqQRCalc, RWLeastSqQRPvtCalc, and RWLeastSqSVDCalc. The following three sections provide a brief description of the method encapsulated by each class, and its pros and cons.
RWLeastSqQRCalc
Class RWLeastSqQRCalc encapsulates the QR method. This method begins by decomposing the regression matrix X into the product of an orthogonal matrix Q and an upper triangular matrix R. The QR representation is then substituted into the equation in Calculation Methods for Linear Regression to obtain the solution
.
|
Pros: |
Good performance. Parameter values are recalculated very quickly when adding or removing predictor variables. Model selection performance is best with this calculation method. |
|
Cons: |
Calculation fails when the regression matrix X has less than full rank. (A matrix has less than full rank if the columns of X are linearly dependent.) Results may not be accurate if X is extremely ill-conditioned. |
RWLeastSqQRPvtCalc
Class RWLeastSqQRPvtCalc uses essentially the same QR method described in RWLeastSqQRCalc, except that the QR decomposition is formed using pivoting.
|
Pros: |
Calculation succeeds for regression matrices of less than full rank. However, calculations fail if the regression matrix contains a column of all 0s. |
|
Cons: |
Slower than the straight QR technique described in RWLeastSqQRCalc. |
RWLeastSqSVDCalc
Class RWLeastSqSVDCalc employs singular value decomposition (SVD). The method solves the least squares problem by decomposing the regression matrix into the form
, where P is an
matrix consisting of p orthonormalized eigenvectors associated with the p largest eigenvalues of
, Q is a
orthogonal matrix consisting of the orthonormalized eigenvectors of
, and S = diag(s1, s2, ... , sp) is a
diagonal matrix of singular values of X. This singular value decomposition of X is used to solve the equation in Calculation Methods for Linear Regression.
|
Pros: |
Works on matrices of less than full rank. Produces accurate results when X has full rank, but is highly ill-conditioned. |
|
Cons: |
Slower than the straight QR technique described in RWLeastSqQRCalc. |