A blog by Rob J Hyndman 

Twitter Gplus RSS

Time series cross-​​validation: an R example

Published on 26 August 2011

I was recently asked how to imple­ment time series cross-​​validation in R. Time series peo­ple would nor­mally call this “fore­cast eval­u­a­tion with a rolling ori­gin” or some­thing sim­i­lar, but it is the nat­ural and obvi­ous ana­logue to leave-​​one-​​out cross-​​validation for cross-​​sectional data, so I pre­fer to call it “time series cross-​​validation”.

Here is some exam­ple code apply­ing time series CV and com­par­ing 1-​​step, 2-​​step, …, 12-​​step fore­casts using the Mean Absolute Error (MAE). Here I com­pare (1) a lin­ear model con­tain­ing trend and sea­sonal dum­mies applied to the log data; (2) an ARIMA model applied to the log data; and (3) an ETS model applied to the orig­i­nal data. The code is slow because I am esti­mat­ing an ARIMA and ETS model for each iter­a­tion. (I’m also esti­mat­ing a lin­ear model, but that doesn’t take long.)

library(fpp) # To load the data set a10
plot(a10, ylab="$ million", xlab="Year", main="Antidiabetic drug sales")
plot(log(a10), ylab="", xlab="Year", main="Log Antidiabetic drug sales")
 
k <- 60 # minimum data length for fitting a model
n <- length(a10)
mae1 <- mae2 <- mae3 <- matrix(NA,n-k,12)
st <- tsp(a10)[1]+(k-2)/12
 
for(i in 1:(n-k))
{
  xshort <- window(a10, end=st + i/12)
  xnext <- window(a10, start=st + (i+1)/12, end=st + (i+12)/12)
  fit1 <- tslm(xshort ~ trend + season, lambda=0)
  fcast1 <- forecast(fit1, h=12)
  fit2 <- Arima(xshort, order=c(3,0,1), seasonal=list(order=c(0,1,1), period=12), 
      include.drift=TRUE, lambda=0, method="ML")
  fcast2 <- forecast(fit2, h=12)
  fit3 <- ets(xshort,model="MMM",damped=TRUE)
  fcast3 <- forecast(fit3, h=12)
  mae1[i,1:length(xnext)] <- abs(fcast1[['mean']]-xnext)
  mae2[i,1:length(xnext)] <- abs(fcast2[['mean']]-xnext)
  mae3[i,1:length(xnext)] <- abs(fcast3[['mean']]-xnext)
}
 
plot(1:12, colMeans(mae1,na.rm=TRUE), type="l", col=2, xlab="horizon", ylab="MAE",
     ylim=c(0.65,1.05))
lines(1:12, colMeans(mae2,na.rm=TRUE), type="l",col=3)
lines(1:12, colMeans(mae3,na.rm=TRUE), type="l",col=4)
legend("topleft",legend=c("LM","ARIMA","ETS"),col=2:4,lty=1)

This yields the fol­low­ing fig­ure.

A use­ful vari­a­tion on this pro­ce­dure is to keep the train­ing win­dow of fixed length. In that case, replace the first line in the for loop with

xshort <- window(a10, start=st+(i-k+1)/12, end=st+i/12)

Then the train­ing set always con­sists of k observations.

Another vari­a­tion is to com­pute one-​​step fore­casts in the test set. Then the body of the for loop should be replaced with the following.

  xshort <- window(a10, end=st + i/12)
  xnext <- window(a10, start=st + (i+1)/12, end=st + (i+12)/12)
  xlong <- window(a10, end=st + (i+12)/12)
  fit1 <- tslm(xshort ~ trend + season, lambda=0)
  fcast1 <- forecast(fit1, h=12)
  fit2 <- Arima(xshort, order=c(3,0,1), seasonal=list(order=c(0,1,1), period=12), 
        include.drift=TRUE, lambda=0, method="ML")
  fit2a <- Arima(xlong, model=fit2, lambda=0)
  fcast2 <- fitted(fit2a)[-(1:length(xshort))]
  fit3 <- ets(xshort,model="MMM",damped=TRUE)
  fit3a <- ets(xlong, model=fit3)
  fcast3 <- forecast(fit3, h=12)[-(1:length(xshort))]
  mae1[i,1:length(xnext)] <- abs(fcast1[['mean']]-xnext)
  mae2[i,1:length(xnext)] <- abs(fcast2[['mean']]-xnext)
  mae3[i,1:length(xnext)] <- abs(fcast3[['mean']]-xnext)

Here the mod­els are fit­ted to the train­ing set (xshort), and then applied to the longer data set (xlong) with­out re-​​estimating the para­me­ters. So the fit­ted val­ues from the lat­ter are one-​​step fore­casts on the whole data set. There­fore, the last part of the fit­ted val­ues vec­tor are one-​​step fore­casts on the test set.

Yet another vari­a­tion which is use­ful for large data sets is to use a form of k-​​fold cross-​​validation where the train­ing sets incre­ment by sev­eral val­ues at a time. For exam­ple, instead of incre­ment­ing by one obser­va­tion in each iter­a­tion, we could shift the train­ing set for­ward by 12 observations.

k <- 60 # minimum data length for fitting a model
n <- length(a10)
mae1 <- mae2 <- mae3 <- matrix(NA,12,12)
st <- tsp(a10)[1]+(k-1)/12
for(i in 1:12)
{
  xshort <- window(a10, end=st + (i-1))
  xnext <- window(a10, start=st + (i-1) + 1/12, end=st + i)
  fit1 <- tslm(xshort ~ trend + season, lambda=0)
  fcast1 <- forecast(fit1, h=12)
  fit2 <- Arima(xshort, order=c(3,0,1), seasonal=list(order=c(0,1,1), period=12), 
      include.drift=TRUE, lambda=0, method="ML")
  fcast2 <- forecast(fit2, h=12)
  fit3 <- ets(xshort,model="MMM",damped=TRUE)
  fcast3 <- forecast(fit3, h=12)
  mae1[i,] <- abs(fcast1[['mean']]-xnext)
  mae2[i,] <- abs(fcast2[['mean']]-xnext)
  mae3[i,] <- abs(fcast3[['mean']]-xnext)
}
plot(1:12, colMeans(mae1), type="l", col=2, xlab="horizon", ylab="MAE",
     ylim=c(0.35,1.5))
lines(1:12, colMeans(mae2), type="l",col=3)
lines(1:12, colMeans(mae3), type="l",col=4)
legend("topleft",legend=c("LM","ARIMA","ETS"),col=2:4,lty=1)

How­ever, because this is based on fewer esti­ma­tion steps, the results are much more volatile. It may be best to aver­age over the fore­cast hori­zon as well:

> mean(mae1)
[1] 0.8053712
> mean(mae2)
[1] 0.7118831
> mean(mae3)
[1] 0.792813

The above code assumes you are using v3.02 or later of the fore­cast pack­age.


Related Posts:


 
Tags: ,
31 Comments  comments 
  • http://robjhyndman.com Rob J Hyndman

    Udo Sglavo shows how to do this in SAS here: Part 1 and Part 2.

  • http://profiles.google.com/zach.mayer Zachary Mayer
    • http://robjhyndman.com Rob J Hyndman

      Nice work! How fast is it com­pared to my direct approach?

      • http://profiles.google.com/zach.mayer Zachary Mayer

        This was all run on my lap­top, under win­dows.  I used the doRe­dis back­end and all 4 cores for the par­al­lel runs.
        Lin­ear model (non-​​parallel): Rob: 3.51, Zach: 2.06
        Arima model: Rob: 184 Zach: 100
        ETS model: Rob: 535 Zach: 410
        It looks like I’m get­ting a slightly dif­fer­ent answer than you for the error at the 1st fore­cast hori­zon.  Other than that, the results appear identical.

      • http://profiles.google.com/zach.mayer Zachary Mayer

        I also ran the Arima model sequen­tially under my code, and it took 188 seconds.

  • http://profiles.google.com/zach.mayer Zachary Mayer

    In the first block of code, you setup the for loop as fol­lows: ‘for(i in 1:(n-k-1))’.  This seems to skip the last pos­si­ble value of xnext, which is a vec­tor of length 1, with a value of 19.43174, for June 2008.  Why note use ‘for(i in 1:(n-k))’ to setup your for loop, and include this last value in your test set?  (If you change the MAE matri­ces to matrix(NA,n-k,12) the code runs fine) 

    • http://robjhyndman.com Rob J Hyndman

      Thanks Zach. I’m not sure why I left that one out. In any case, I’ve now updated the code as you suggest.

  • Pingback: Time Series Cross Validation | Jeffrey Chin Tang Wong

  • Anusha Anusha

    Dear Sir, Thank you for a use­ful post. Would you please give a ref­er­ence for the above method of cross val­i­dat­ing time series? Also I have not found many papers dis­cussing time series cross val­i­da­tion. Is the method sug­gested above widely accept­able or most used method for car­ry­ing out cross val­i­da­tion for time series and are there other alter­na­tives that must be con­sid­ered along with the above method?

    Thank­ing you
    Anusha

  • http://robjhyndman.com Rob J Hyndman

    auto.arima does not use cross-​​validation. You would have to write some­thing sim­i­lar to auto.arima which uses cv to select the model.

  • Roy Sander­son

    Dear Rob

    Thanks for the excel­lent descrip­tion. I’ve been won­der­ing whether this will help me crack an appar­ently sim­ple prob­lem: I’ve a sea­sonal Arima model (2,1,1)x(1,0,1) 24hrs, with hrly records over 78 days, and 5 sig­nif­i­cant xreg vari­ables. I’m try­ing to deter­mine the “effect size” of each xreg vari­able, and have been set­ting 4 of them to their means whilst leav­ing 1 at is orig­i­nal, cal­cu­lat­ing pre­dicted val­ues, and doing this in turn for each xreg. Is this cor­rect (whilst it’s the usual way for lm, per­haps not for ts)? Even if it is cor­rect, I’ve no idea how to cal­cu­late 95% pre­dic­tion inter­vals for xreg effect sizes from an Arima. So I was won­der­ing whether to cal­cu­late lots of Ari­mas, in a ‘leave-​​k-​​out’ approach for all but a sub­set of the data, and fore­cast­ing val­ues for each small sub­set. As this will take a very long time to run on my slow PC I thought I’d bet­ter check with you first!
    Many thanks for any hints.

    Roy

    • http://robjhyndman.com Rob J Hyndman

      Your approach will give you esti­mated effect sizes. How­ever, I don’t know what you mean by pre­dic­tion inter­vals for effect sizes. A pre­dic­tion inter­val is for an unob­served ran­dom vari­able, whereas an effect size is not ran­dom. Per­haps you mean a con­fi­dence inter­val? In that case, the ci can be esti­mated as x times the ci for beta.

      • Roy Sander­son

        Dear Rob
        Many thanks for your reply — apolo­gies, I should have said 95% CI. What I’m after is plots sim­i­lar to those obtained via the all­Ef­fects func­tion, except for the xreg vari­ables used in an arima rather than a lm. Many thanks if you think the method I’ve sug­gested should be OK.
        Roy

  • Isabella Ghe­ment

    Dear Prof. Hyndman,

    In prac­tice, would the entire time series be used to guess that the ARIMA model should be spec­i­fied as ARIMA(p,d,q)(P,D,Q)s? where p=3, d=0, q=1, P=0, D=1,Q=1 and s=12?

    Once the struc­ture of the ARIMA model is iden­ti­fied from the entire time series, is this same struc­ture used when pro­duc­ing rolling-​​window forecasts?

    Thank you in advance for con­sid­er­ing my ques­tion.
    Isabella

    • http://robjhyndman.com/ Rob J Hyndman

      You would nor­mally be try­ing sev­eral mod­els and select­ing the best based on a cross-​​validation procedure.

      • Isabella Ghe­ment

        Thank you so much for your prompt response. I am still unclear on how this is sup­posed to work in prac­tice, so I will elab­o­rate below on my confusion.

        Let’s say that I would like to pre­dict the unknown value of a future obser­va­tion y_{t+1} on the basis of the obser­va­tions y_{1}, …, y_{n}.

        Assum­ing that y_{1}, …, y_{n} fol­low an ARIMA(p,d,q) process, I would pro­ceed to iden­tify sen­si­ble val­ues for p,d,q from the data y_{1}, …, y_{n} . For sim­plic­ity, let’s denote these val­ues by p*, d*, q*.

        Given the val­ues p*, d*, q*, I would esti­mate the para­me­ters of the ARIMA(p*,d*,q*) process and use them to pro­duce a fore­cast of the future obser­va­tion y_{t+1}.

        Next, I would try to pro­duce a mea­sure of accu­racy to accom­pany this fore­cast using the “rolling fore­cast ori­gin” pro­ce­dure described at http://​otexts​.com/​f​p​p​/2/5/.

        To this end, I would fit “fore­cast” ARIMA mod­els to the fol­low­ing rolling train­ing sets (where m < n):

        1) y_{1}, …, y_{m};
        2) y_{1}, …, y_{m+1};
        3) y_{1}, …, y_{m+2};
        etc.

        My con­fu­sion is this: What is the con­nec­tion between these “fore­cast” ARIMA mod­els and the ARIMA(p*,d*,q*) model iden­ti­fied from the entire series?

        Do we assume that the “fore­cast” ARIMA mod­els have the same order struc­ture as the ARIMA(p*,d*,q*) model?

        • http://robjhyndman.com/ Rob J Hyndman

          Cross-​​validation is both a method of mea­sur­ing accu­racy *and* a method for choos­ing a model. The model you finally use for fore­cast­ing is the one that gives the best cross-​​validation accuracy.

          • qoheleth

            This might or might not be Isabella’s ques­tion. You don’t nor­mally start the mod­el­ling with CV. You would have first made a rough guess on (p,d,q) maybe using your auto.arima func­tion or some other pro­ce­dure that uses all the data. Then you feed this into a CV rou­tine, pos­si­bly with some vari­ants of this model for com­par­i­son. The ques­tion is, in this case, the test set is no longer unseen, so maybe the test error will still be biased down­wards? (this is a question)

  • Maria Froes

    Dear Prof. Hyn­d­man, one thing I don’t under­stand is, in the fol­low­ing command:

    st <- tsp(a10)[1]+(k-2)/12

    why do you sub­tract k-​​2?
    Thank you

  • Anon

    How would this approach be applied in a nested fash­ion (i.e. what is the time series equiv­a­lence to nested cross validation)?

  • brian

    I have recently tried some­thing sim­i­lar. I’ve set up a loop to fit a model on a rolling sub­set (fixed num­ber of obs). I then com­pute a 1-​​step ahead fore­cast. Col­lect­ing the indi­vid­ual fore­casts into a data frame, I then regress the actual results against the 1-​​step ahead fore­casts. How should I inter­pret the para­me­ters of this sec­ond regres­sion? If I want to make a 1-​​step ahead fore­cast, am I bet­ter off using the raw model pre­dic­tions (from the lat­est rolling regres­sion) or plug in the model pre­dic­tions into my sec­ond regression?

    • http://robjhyndman.com/ Rob J Hyndman

      The coef­fi­cients mea­sure bias in the fore­casts. Unless some­thing is very wrong, the inter­cept should be insignif­i­cantly dif­fer­ent from zero and the slope should be insignif­i­cantly dif­fer­ent from one. Assum­ing that is true, just use the orig­i­nal predictions.

  • qoheleth

    If sam­ple size n =100, but fre­quency is 35, would it make sense align h to be 35? i.e. use first 30 to be test 31…65, then use first 65 to test 66..100? The prob­lem is of course the error becomes very vari­able. Or would you ignore fre­quency when doing CV?