ARIMA models with long lags

Today’s email question:

I work within a gov­ern­ment bud­get office and some­times have to fore­cast fairly sim­ple time series sev­eral quar­ters into the future. Auto.arima() works great and I often get some­thing along the lines of: ARIMA(0,0,1)(1,1,0)[12] with drift as the low­est AICc.

How­ever, my boss (who does not use R) takes issue with low-​​order AR and MA because “you’re essen­tially using fore­casted data to make your fore­cast.” His mod­els include AR(10) MA(12)s etc. rather fre­quently. I argue that’s over­fit­ting. I don’t see a great deal of dis­cus­sion in text­books about this, and I’ve never seen such higher-​​order mod­els in a text­book set­ting. But are they fairly com­mon in prac­tice? What con­cerns could I raise with him about higher-​​order mod­els? Any advice you could give would be appreciated.

My response:

If you mean ARIMA mod­els of that size with all coef­fi­cients esti­mated, than yes, that is def­i­nitely over­fit­ting for quar­terly data. I do not believe such mod­els are com­mon in prac­tice, as most peo­ple in busi­ness are now using auto­mated algo­rithms for ARIMA mod­el­ling, and the auto­mated algo­rithms (in any rep­utable soft­ware) are not going to give a model like that.

I don’t under­stand why a low order ARIMA model is “using fore­casted data to make your fore­cast” in con­trast to higher order mod­els. Almost all time series mod­els use recur­sive fore­cast cal­cu­la­tions, and so h–step fore­casts use the pre­ced­ing 1:(h-1)–step fore­casts.

But per­haps you mean high order ARIMA mod­els with a lot of coef­fi­cients set to zero. These are usu­ally called sub­set ARIMA mod­els. In that case, there is not nec­es­sar­ily over-​​fitting although there will be a loss of data used in esti­ma­tion due to the long lags required.

Your boss might be think­ing of using lags longer than h so that the fore­casts are only based on obser­va­tions, and not on inter­me­di­ate fore­casts. That is a strat­egy that is some­times used, and then you have to use a dif­fer­ent model for every fore­cast hori­zon, with no terms for lags 1,...,(h-1). It is called “direct fore­cast­ing” rather than “recur­sive fore­cast­ing”. But unless the model is highly non­lin­ear, it will not gen­er­ally give bet­ter fore­casts. The loss of effi­ciency due to using fewer obser­va­tions gen­er­ally does more harm than the poten­tial bias improve­ments due to fore­cast­ing directly. This is dis­cussed a lit­tle in my recent boost­ing paper and in the ref­er­ences we cite.

One strat­egy you could take is to fit both mod­els and look at their com­par­a­tive fore­cast accu­racy on a rolling fore­cast­ing ori­gin (aka time series cross val­i­da­tion). The more par­si­mo­nious mod­els will almost always fore­cast bet­ter, and the empir­i­cal evi­dence of fore­cast accu­racy num­bers may be enough to con­vince your boss.

Related Posts:

  • Let­ian Zheng

    Could we use Lasso or Ridge Regres­sion to shrink the coef­fi­cients of AR℗ ? Then the order of the p might not be such impor­tant. Not sure if any research is done on this.