Statistical tests for variable selection

I received an email today with the following comment:

I’m using ARIMA with Intervention detection and was planning to use your package to identify my initial ARIMA model for later iteration, however I found that sometimes the auto.arima function returns a model where AR/MA coefficients are not significant. So my question is: Is there a way to filter the search for ARIMA models that only have significant coefficients. I can remove the non-significant coefficients but I think it would be better to search for those models that only have significant coefficients.

Statistical significance is not usually a good basis for determining whether a variable should be included in a model, despite the fact that many people who should know better use them for exactly this purpose.  Even some textbooks discuss variable selection using statistical tests, thus perpetuating bad statistical practice.

Statistical tests were designed to test hypotheses, not select variables. Tests on coefficients are answering a different question from whether the variable is useful in forecasting. It is possible to have an insignificant coefficient associated with a variable that is useful for forecasting. It is also possible to have a significant variable associated with a variable that is better omitted when forecasting.

To see why the first situation occurs, think about two highly correlated predictor variables. It may be that the model that includes them both gives the best forecasts, but any statistical tests on the coefficients can give insignificant values because it is hard to distinguish their separate contributions (thus causing the standard errors on their coefficients to be large). This is almost always a problem with AR coefficients because the corresponding predictors are lagged variations of each other and often highly correlated.

The second situation occurs, for example, when a predictor has high variability and a small coefficient. When the sample size is large enough, the estimated coefficient may be statistically significant. But for forecasting purposes, including the predictor increases the variance of the forecast without contributing much additional information.

See Harrell’s book Regression Modelling Strategies for further discussion on the misuse of statistical tests for variable selection.

A much more reliable guide to selecting terms in any model, including ARIMA models, is to use cross-validation or an approximation to it such as the AIC. The auto.arima() function from the forecast package in R uses the AIC by default and usually chooses a reasonably good model for forecasting. If users wish to experiment with other models, use the AIC for comparison not significance tests of the coefficients.

Related Posts:

  • I couldn’t agree more! Rarely do I come by a post that I agree with completely.

  • Peter Cahusac

    Excellent post, sensible comments, thanks.

  • Ken

    Nice post. Could you perhaps talk more about:

    1. AIC vs AICc. When should one use AICc? When n/k < 30, where n is the sample size and k is the number of estimated parameters?

    2. Is AICc valid only for linear models with exogenous regressors?

    3. Does the CV suffer from small sample issues (especially in the case of the linear regression model)?

    4. At what sample size, I guess relative to the number of estimated parameters, does the AIC more or less become equivalent to the CV?

    4. When you have a lot of regressors, you can't estimate your model. Does it matter if you add one variable at a time or go from h regressors and work your way down one regressor at a time until you have the min CV or AIC model, and then begin to test one variable at a time of the remaining variables to see if any can reduce the CV or AIC further?

    I believe that many undergrad and grad students would benefit greatly from hearing your response to the above questions.

    Thanks much and keep you excellent work.

  • Tarmo Leinonen

    “If users wish to experiment with other [sub] models, use the AIC [or something] for comparison ”

    This might do the trick. Note the link to the code on the page.

    (A related bug seems to be still in R version 2.13.1 )

    “a problem with AR coefficients … the corresponding predictors are lagged variations of each other and often highly correlated”

    And MA coefficients and AR coefficients are correlated too. The estimation produces frequently AR and MA coefficients which cancel each others out even if the confidence interval estimate claims that the coefficients are very significantly different from zero. To detect these mirage coefficients, re-estimate with either max.p=0 or max.q=0.

    cov2cor(vcov( is supposed to reveal (not-wanted) correlations between coefficients.

  • mike

    Excellent post!
    Rob, How do we know if a time series do not depend on t ? I mean, for example, how do we know if a product demand are correlated or not?.

  • Adrian

    Thanks for the excellent post. Would your answer change if the p value is constructed using Newey and West’s HAC estimator ?