Major changes to the forecast package

The fore­cast pack­age for R has under­gone a major upgrade, and I’ve given it ver­sion num­ber 3 as a result. Some of these changes were sug­ges­tions from the fore­cast­ing work­shop I ran in Switzer­land a cou­ple of months ago, and some have been on the draw­ing board for a long time. Here are the main changes in ver­sion 3, plus a few ear­lier addi­tions that I thought deserved a men­tion.

Box-​​Cox transformations

A Box-​​Cox trans­for­ma­tion can now be incor­po­rated into the fore­cast­ing model when call­ing Arima(), auto.arima(), ets(), arfima(), tslm(), stlf(), rwf(), meanf(), splinef(). For example:

fit <- Arima(lynx, order=c(2,0,0), lambda=0.5)
plot(forecast(fit,h=20))

The model is based on the trans­formed data, fore­casts are cal­cu­lated, and then the fore­casts and pre­dic­tion inter­vals are back-​​transformed. The point fore­casts can be inter­preted as medi­ans after back-​​transforming.

If the trans­for­ma­tion is done out­side the fit­ting func­tion, the fore­casts can still be back-​​transformed. For example:

fit <- ar(BoxCox(lynx,0.5))
plot(forecast(fit,lambda=0.5))

Back-​​transforming fore­casts like this is now avail­able in forecast.Arima(), forecast.ets(), forecast.fracdiff(), forecast.ar(), forecast.StructTS, forecast.HoltWinters().

I have also added a func­tion for auto­mat­i­cally choos­ing the Box-​​Cox para­me­ter using either Guerrero’s (1993) method or the pro­file log like­li­hood method. For example:

fit <- Arima(lynx, order=c(2,0,0), lambda=BoxCox.lambda(lynx))
plot(forecast(fit))

Note that pre­vi­ously there was a lambda argu­ment in the plot.forecast() func­tion. This is no longer avail­able (and so some old code may break). Instead, back-​​transform the fore­casts within the forecast() function.

Improved auto.arima()

The auto.arima() func­tion is widely used for auto­mat­i­cally select­ing ARIMA mod­els. It works quite well, except that selec­tion of D, the order of sea­sonal dif­fer­enc­ing, has always been poor. Up until now, the default has been to use the Canova-​​Hansen test to select D. Because the CH test has a null hypoth­e­sis of deter­min­is­tic sea­son­al­ity based on dummy vari­ables, the func­tion will often select D=0. So I’ve now switched to using the OCSB test for select­ing D which has a null hypoth­e­sis involv­ing a sea­sonal dif­fer­ence, so it is much more likely to choose D=1 than pre­vi­ously. I’ve done exten­sive test­ing of the fore­casts obtained under the two meth­ods, and the OCSB test leads to bet­ter fore­casts. Hence it is now the default. This means that the func­tion may return a dif­fer­ent ARIMA model than pre­vi­ously when the data are seasonal.

A sep­a­rate func­tion for select­ing the sea­sonal order has also been made vis­i­ble. So you can now call nsdiffs() to find the rec­om­mended num­ber of sea­sonal dif­fer­ences with­out call­ing auto.arima(). There is also a ndiffs() func­tion for select­ing the num­ber of first dif­fer­ences. Within auto.arima(), nsdiffs() is called first to select D, and then ndiffs() is applied to diff(x,D) if D>0 or to x if D=0.

Double-​​seasonal Holt-​​Winters

The new dshw() func­tion imple­ments Taylor’s (2003) double-​​seasonal Holt-​​Winters method. This allows for two lev­els of sea­son­al­ity. For exam­ple, with hourly data, there is often a daily period of 24 and a weekly period of 168. These are mod­elled sep­a­rately in the dshw() function.

I am plan­ning some major new func­tion­al­ity to extend this to the var­i­ous types of com­plex sea­son­al­ity dis­cussed in my recent JASA paper. Hope­fully that will be ready in the next few weeks — I have a research assis­tant work­ing on the new code.

Sub-​​setting time series

Occa­sion­ally you want to extract all the Novem­bers in a monthly time series (or some­thing sim­i­lar), but this has been fid­dly to do up to now. So I’ve included a new func­tion subset.ts. For example:

plot(subset(gas,month="November"))
subset(woolyrnq,quarter=3)

Acf() and Pacf()

These were actu­ally added in v2.19 but I’ve not men­tioned them any­where so I thought it would be use­ful to say some­thing here. The acf() func­tion always includes a spike of length 1 at lag 0. This is point­less because the ACF at lag 0 is 1 by def­i­n­i­tion. It is also annoy­ing because it forces the scale of the y-​​axis to include 1 which can obscure smaller cor­re­la­tions that might be of inter­est. The Acf() func­tion works in the same way as the acf() func­tion except that it omits lag 0. The Pacf() func­tion is included for con­sis­tency only — it returns the same object and pro­duces the same plot as pacf().

Time series lin­ear models

Another recent addi­tion, but not new in v3, is the tslm() func­tion for han­dling lin­ear mod­els for time series. This works in the same way as lm() except that the time series char­ac­ter­is­tics of the data are pre­served in the resid­u­als and fit­ted val­ues. Also, the vari­ables trend and seasonal can be used with­out need­ing to be defined. For example:

y <- ts(rnorm(120,0,3) + 20*sin(2*pi*(1:120)/12), frequency=12)
fit1 <- tslm(y ~ trend + season)
plot(forecast(fit1))

trend takes val­ues 1,2,\dots,n where n is the length of y, and season is a matrix of sea­sonal dummy variables.

Cross-​​validation

The CV() func­tion is another addi­tion from ear­lier in the year. It imple­ments the cross-​​validation sta­tis­tic, AIC, cor­rected AIC, BIC and adjusted R2 val­ues for a lin­ear model. For example:

y <- ts(rnorm(120,0,3) + 20*sin(2*pi*(1:120)/12), frequency=12)
fit1 <- tslm(y ~ trend + season)
fit2 <- tslm(y ~ season)
CV(fit1)
CV(fit2)

CV works with any lm objects includ­ing those pro­duced by tslm() and lm().

Fore­cast­ing with STL

This func­tion­al­ity was added ear­lier in the year, but it is so cool I wanted to men­tion it here. STL is a great method for decom­pos­ing time series into trend, sea­sonal and irreg­u­lar com­po­nents. It is robust and han­dles time series of any frequency.

To fore­cast with STL, you sea­son­ally adjust the data by sub­tract­ing the sea­sonal com­po­nent, then fore­cast the sea­son­ally adjusted data using a non-​​seasonal ARIMA or ETS model, then re-​​seasonalize the fore­casts by adding back in the most recent val­ues of the sea­sonal com­po­nent (effec­tively using a sea­sonal naïve fore­cast for the sea­sonal com­po­nent). The whole pro­ce­dure is han­dled effort­lessly as follows

fit <- stl(USAccDeaths,s.window="periodic")
plot(forecast(fit))

There is also a new func­tion stlf() which does the STL decom­po­si­tion as well as the fore­cast­ing in one step.

plot(stlf(AirPassengers, lambda=BoxCox.lambda(AirPassengers)))

STL decom­po­si­tions are always addi­tive, but the inclu­sion of the Box-​​Cox para­me­ter as shown here allows non-​​additive decom­po­si­tions as well.

For data with high sea­sonal period (such as weekly data, hourly data, etc.), fore­cast­ing with STL is often the sim­plest approach. It also works amaz­ingly well on a wide range of series. If you apply the forecast() func­tion to a time series (rather than a time series model), the fore­casts returned will be from stlf() when the sea­sonal period is 13 or more.

Other changes

A list of all changes to the fore­cast pack­age is main­tained in the ChangeLog.

With so many changes and new func­tions, I’ve prob­a­bly intro­duced new bugs. Please let me know if you find any prob­lems and I’ll endeav­our to fix any­thing ASAP. (Make sure you get v3.02 or later as v3.00 had some bugs.)


Related Posts:


  • gior­gio

    Hi, could you pro­vide us a ref­er­ence to the algo­rithm you adopted to imple­ment the OCSB test? Thanks

    • http://robjhyndman.com Rob J Hyndman

      See the help files. We give the fol­low­ing ref­er­ences in the help for nsdiffs():

      Osborn DR, Chui APL, Smith J, and Birchen­hall CR (1988) “Sea­son­al­ity and the order of inte­gra­tion for consumption”,Oxford Bul­letin of Eco­nom­ics and Sta­tis­tics 50(4):361–377.

      Osborn, D.R. (1990) “Sea­son­al­ity and the order of inte­gra­tion in con­sump­tion”, Inter­na­tional Jour­nal of Forecasting,6:327–336.

      • Gior­gio

        Thank you. Best, Giorgio

  • Pingback: Research tips - forecast package v4.0

  • afs­dafs­dfs­dadf­safa

    Hello, many thanks for your effort. I have a ques­tion about ocsb and ch tests. How is it pos­si­ble to get p val­ues and sta­tis­tics? What is the rea­son that these two tests give dif­fer­ent answers?

    • http://robjhyndman.com Rob J Hyndman

      They have dif­fer­ent test sta­tis­tics and dif­fer­ent null hypthoses. Why should they give the same answer? You can get the details within the func­tions forecast:::CHtest and forecast:::OCSBtest.

      • afs­dafs­dfs­dadf­safa

        Thank you for your answer.
        Another ques­tion of mine is about test sta­tis­tics. Is it any­how pos­si­ble to get their val­ues and/​or p values?

  • Joe Park

    Hi,
    Thank you for mak­ing your book open access. With­out any back­ground in sta­tis­tics, I read the book in four days and I am now mak­ing forecasts.

    Could you please tell me why auto.arima() is giv­ing a non-​​seasonal arima for data that seems to clearly have a sea­son­al­ity of 12 in my monthly data as demon­strated in the attached pic­ture? Also, is there a bet­ter place for me to find answers to ques­tions of this sort with­out using up your time?
    Thank you,

    • http://robjhyndman.com/ Rob J Hyndman

      You have not spec­i­fied the fre­quency attribute in the ts object. Please ask ques­tions on cross​val​i​dated​.com.

      • Joe Park

        Thank you

  • Pingback: R Forecasting with STL