Backward Selection
Backward selection has computational properties that are similar to forward selection. The starting subset in backward selection includes all possible predictor variables. Predictor variables are deleted one at a time as long as this results in a subset with a higher evaluation criterion. Again, in the worst case, at most N((N + 1)/2) subsets must be evaluated before the search ends. Like forward selection, backward selection is not guaranteed to find the subset with the highest evaluation criterion.
Some researchers prefer backward selection to forward selection when the predictor variables are far from statistically independent. In this case, starting the search with all predictor variables included allows the model to take predictor variable interactions into account. Forward selection will not add two predictor variables that together can explain the variations in the observation variable if, individually, the predictor variables are not helpful in explaining the variation. Backward selection, on the other hand, would already include both of these variables and would realize that it is a bad idea to delete either one.
The disadvantage of backward selection is that one’s confidence in subset evaluation criterion values tends to be lower than with forward selection. This is especially true when the number of rows in the predictor matrix is close to the number of possible predictor variables. In such a case, there are very few points that the regression model can use in order to determine its parameter values, and the function evaluation criterion will be sensitive to small changes to the predictor matrix data. When the ratio of predictor matrix rows to predictor variables is small, it is usually a better idea to use forward selection than backward selection.