Why doesn't auto.arima() return the model with the lowest AICc value?

Hyndsight

This question seems to come up frequently on crossvalidated.com or in my inbox.

I have this time series, however it yields different results when I use the auto.arima and Arima functions.

library(forecast)
xd <- ts(c(23786, 25955, 54373, 21561, 14552, 13284, 12714, 11821, 15445, 21307, 17228, 20007, 23065, 32811, 43147, 15127, 13497, 12224, 11412, 11888, 14210,18978, 15782, 17216, 16417, 22861, 42616, 17057,  9741, 10503,  7170, 10686,  9762, 15773, 15280, 13212, 14784, 26104, 29947), frequency = 12, start=c(2014,1), end=c(2017,3))

fit1 <- auto.arima(xd, trace=TRUE, allowmean=FALSE, allowdrift = FALSE, max.p=1, max.q=0, max.P=1, max.Q=0, xreg=seq_along(xd))
## 
##  ARIMA(1,0,0)(1,0,0)[12] with zero mean     : Inf
##  ARIMA(0,0,0)            with zero mean     : 864.3436
##  ARIMA(1,0,0)(1,0,0)[12] with zero mean     : Inf
##  ARIMA(0,0,0)(1,0,0)[12] with zero mean     : Inf
##  ARIMA(1,0,0)            with zero mean     : 837.0974
## 
##  Best model: Regression with ARIMA(1,0,0)            errors
(fit2 <- Arima(xd, order=c(1,0,0), seasonal=c(1,0,0), xreg=seq_along(xd)))
## Series: xd 
## Regression with ARIMA(1,0,0)(1,0,0)[12] errors 
## 
## Coefficients:
##          ar1    sar1  intercept  seq_along(xd)
##       0.0236  0.9010  22580.155      -221.3270
## s.e.  0.1796  0.0461   2864.285        64.5285
## 
## sigma^2 estimated as 17536975:  log likelihood=-388.51
## AIC=787.01   AICc=788.83   BIC=795.33

auto.arima() shows an AICc value of Inf for an ARIMA(1,0,0)(1,0,0) model, while the same model has a finite value using Arima().

The issue here is to do with the checks carried out by auto.arima() in an effort to return a good model. The auto.arima() function does not simply find the model with the lowest AICc value. It also carries out several checks to ensure the model is numerically well-behaved.

While the Arima() function will never return a model with roots inside the unit circle, the auto.arima() function is even stricter and will not select a model with roots close to the unit circle either. The ARIMA(1,0,0)(1,0,0)\(_{12}\) model fitted above has roots almost on the unit circle. This is easily seen by plotting them.

Arima(xd, order=c(1,0,0), seasonal=c(1,0,0), xreg=seq_along(xd)) %>%
  autoplot()

In fact, there are 12 roots with absolute value 1.009, just outside the unit circle (so the inverse roots that are plotted are just inside the circle). Consequently, this model is rejected by auto.arima() because the forecasts will be numerically unstable, and the AICc value is set to Inf to prevent it being selected.

In general, unless you know exactly what you’re doing, it is better to leave auto.arima() to select a model for you. In this case, it comes up with the following model which should forecast well.

(fit <- auto.arima(xd))
## Series: xd 
## ARIMA(0,0,0)(1,1,0)[12] with drift 
## 
## Coefficients:
##          sar1      drift
##       -0.6830  -240.1478
## s.e.   0.1389    35.2611
## 
## sigma^2 estimated as 9366701:  log likelihood=-257.75
## AIC=521.5   AICc=522.55   BIC=525.39
fit %>% forecast() %>% autoplot()

comments powered by Disqus