A blog by Rob J Hyndman 

Twitter Gplus RSS

Time series cross-​​validation: an R example

Published on 26 August 2011

I was recently asked how to imple­ment time series cross-​​validation in R. Time series peo­ple would nor­mally call this “fore­cast eval­u­a­tion with a rolling ori­gin” or some­thing sim­i­lar, but it is the nat­ural and obvi­ous ana­logue to leave-​​one-​​out cross-​​validation for cross-​​sectional data, so I pre­fer to call it “time series cross-​​validation”.

Here is some exam­ple code apply­ing time series CV and com­par­ing 1-​​step, 2-​​step, …, 12-​​step fore­casts using the Mean Absolute Error (MAE). Here I com­pare (1) a lin­ear model con­tain­ing trend and sea­sonal dum­mies applied to the log data; (2) an ARIMA model applied to the log data; and (3) an ETS model applied to the orig­i­nal data. The code is slow because I am esti­mat­ing an ARIMA and ETS model for each iter­a­tion. (I’m also esti­mat­ing a lin­ear model, but that doesn’t take long.)

library(fpp) # To load the data set a10
plot(a10, ylab="$ million", xlab="Year", main="Antidiabetic drug sales")
plot(log(a10), ylab="", xlab="Year", main="Log Antidiabetic drug sales")
 
k <- 60 # minimum data length for fitting a model
n <- length(a10)
mae1 <- mae2 <- mae3 <- matrix(NA,n-k,12)
st <- tsp(a10)[1]+(k-2)/12
 
for(i in 1:(n-k))
{
  xshort <- window(a10, end=st + i/12)
  xnext <- window(a10, start=st + (i+1)/12, end=st + (i+12)/12)
  fit1 <- tslm(xshort ~ trend + season, lambda=0)
  fcast1 <- forecast(fit1, h=12)
  fit2 <- Arima(xshort, order=c(3,0,1), seasonal=list(order=c(0,1,1), period=12), 
      include.drift=TRUE, lambda=0, method="ML")
  fcast2 <- forecast(fit2, h=12)
  fit3 <- ets(xshort,model="MMM",damped=TRUE)
  fcast3 <- forecast(fit3, h=12)
  mae1[i,1:length(xnext)] <- abs(fcast1[['mean']]-xnext)
  mae2[i,1:length(xnext)] <- abs(fcast2[['mean']]-xnext)
  mae3[i,1:length(xnext)] <- abs(fcast3[['mean']]-xnext)
}
 
plot(1:12, colMeans(mae1,na.rm=TRUE), type="l", col=2, xlab="horizon", ylab="MAE",
     ylim=c(0.65,1.05))
lines(1:12, colMeans(mae2,na.rm=TRUE), type="l",col=3)
lines(1:12, colMeans(mae3,na.rm=TRUE), type="l",col=4)
legend("topleft",legend=c("LM","ARIMA","ETS"),col=2:4,lty=1)

This yields the fol­low­ing fig­ure.

A use­ful vari­a­tion on this pro­ce­dure is to keep the train­ing win­dow of fixed length. In that case, replace the first line in the for loop with

xshort <- window(a10, start=st+(i-k+1)/12, end=st+i/12)

Then the train­ing set always con­sists of k observations.

Another vari­a­tion is to com­pute one-​​step fore­casts in the test set. Then the body of the for loop should be replaced with the following.

  xshort <- window(a10, end=st + i/12)
  xnext <- window(a10, start=st + (i+1)/12, end=st + (i+12)/12)
  xlong <- window(a10, end=st + (i+12)/12)
  fit1 <- tslm(xshort ~ trend + season, lambda=0)
  fcast1 <- forecast(fit1, h=12)
  fit2 <- Arima(xshort, order=c(3,0,1), seasonal=list(order=c(0,1,1), period=12), 
        include.drift=TRUE, lambda=0, method="ML")
  fit2a <- Arima(xlong, model=fit2, lambda=0)
  fcast2 <- fitted(fit2a)[-(1:length(xshort))]
  fit3 <- ets(xshort,model="MMM",damped=TRUE)
  fit3a <- ets(xlong, model=fit3)
  fcast3 <- forecast(fit3, h=12)[-(1:length(xshort))]
  mae1[i,1:length(xnext)] <- abs(fcast1[['mean']]-xnext)
  mae2[i,1:length(xnext)] <- abs(fcast2[['mean']]-xnext)
  mae3[i,1:length(xnext)] <- abs(fcast3[['mean']]-xnext)

Here the mod­els are fit­ted to the train­ing set (xshort), and then applied to the longer data set (xlong) with­out re-​​estimating the para­me­ters. So the fit­ted val­ues from the lat­ter are one-​​step fore­casts on the whole data set. There­fore, the last part of the fit­ted val­ues vec­tor are one-​​step fore­casts on the test set.

Yet another vari­a­tion which is use­ful for large data sets is to use a form of k-​​fold cross-​​validation where the train­ing sets incre­ment by sev­eral val­ues at a time. For exam­ple, instead of incre­ment­ing by one obser­va­tion in each iter­a­tion, we could shift the train­ing set for­ward by 12 observations.

k <- 60 # minimum data length for fitting a model
n <- length(a10)
mae1 <- mae2 <- mae3 <- matrix(NA,12,12)
st <- tsp(a10)[1]+(k-1)/12
for(i in 1:12)
{
  xshort <- window(a10, end=st + (i-1))
  xnext <- window(a10, start=st + (i-1) + 1/12, end=st + i)
  fit1 <- tslm(xshort ~ trend + season, lambda=0)
  fcast1 <- forecast(fit1, h=12)
  fit2 <- Arima(xshort, order=c(3,0,1), seasonal=list(order=c(0,1,1), period=12), 
      include.drift=TRUE, lambda=0, method="ML")
  fcast2 <- forecast(fit2, h=12)
  fit3 <- ets(xshort,model="MMM",damped=TRUE)
  fcast3 <- forecast(fit3, h=12)
  mae1[i,] <- abs(fcast1[['mean']]-xnext)
  mae2[i,] <- abs(fcast2[['mean']]-xnext)
  mae3[i,] <- abs(fcast3[['mean']]-xnext)
}
plot(1:12, colMeans(mae1), type="l", col=2, xlab="horizon", ylab="MAE",
     ylim=c(0.35,1.5))
lines(1:12, colMeans(mae2), type="l",col=3)
lines(1:12, colMeans(mae3), type="l",col=4)
legend("topleft",legend=c("LM","ARIMA","ETS"),col=2:4,lty=1)

How­ever, because this is based on fewer esti­ma­tion steps, the results are much more volatile. It may be best to aver­age over the fore­cast hori­zon as well:

> mean(mae1)
[1] 0.8053712
> mean(mae2)
[1] 0.7118831
> mean(mae3)
[1] 0.792813

The above code assumes you are using v3.02 or later of the fore­cast pack­age.


Related Posts:


 
Tags: ,
14 Comments  comments 
  • http://robjhyndman.com Rob J Hyndman

    Udo Sglavo shows how to do this in SAS here: Part 1 and Part 2.

  • http://profiles.google.com/zach.mayer Zachary Mayer
    • http://robjhyndman.com Rob J Hyndman

      Nice work! How fast is it com­pared to my direct approach?

      • http://profiles.google.com/zach.mayer Zachary Mayer

        This was all run on my lap­top, under win­dows.  I used the doRe­dis back­end and all 4 cores for the par­al­lel runs.
        Lin­ear model (non-​​parallel): Rob: 3.51, Zach: 2.06
        Arima model: Rob: 184 Zach: 100
        ETS model: Rob: 535 Zach: 410
        It looks like I’m get­ting a slightly dif­fer­ent answer than you for the error at the 1st fore­cast hori­zon.  Other than that, the results appear identical.

      • http://profiles.google.com/zach.mayer Zachary Mayer

        I also ran the Arima model sequen­tially under my code, and it took 188 seconds.

  • http://profiles.google.com/zach.mayer Zachary Mayer

    In the first block of code, you setup the for loop as fol­lows: ‘for(i in 1:(n-k-1))’.  This seems to skip the last pos­si­ble value of xnext, which is a vec­tor of length 1, with a value of 19.43174, for June 2008.  Why note use ‘for(i in 1:(n-k))’ to setup your for loop, and include this last value in your test set?  (If you change the MAE matri­ces to matrix(NA,n-k,12) the code runs fine) 

    • http://robjhyndman.com Rob J Hyndman

      Thanks Zach. I’m not sure why I left that one out. In any case, I’ve now updated the code as you suggest.

  • Pingback: Time Series Cross Validation | Jeffrey Chin Tang Wong

  • Anusha Anusha

    Dear Sir, Thank you for a use­ful post. Would you please give a ref­er­ence for the above method of cross val­i­dat­ing time series? Also I have not found many papers dis­cussing time series cross val­i­da­tion. Is the method sug­gested above widely accept­able or most used method for car­ry­ing out cross val­i­da­tion for time series and are there other alter­na­tives that must be con­sid­ered along with the above method?

    Thank­ing you
    Anusha

  • http://robjhyndman.com Rob J Hyndman

    auto.arima does not use cross-​​validation. You would have to write some­thing sim­i­lar to auto.arima which uses cv to select the model.

  • Roy Sander­son

    Dear Rob

    Thanks for the excel­lent descrip­tion. I’ve been won­der­ing whether this will help me crack an appar­ently sim­ple prob­lem: I’ve a sea­sonal Arima model (2,1,1)x(1,0,1) 24hrs, with hrly records over 78 days, and 5 sig­nif­i­cant xreg vari­ables. I’m try­ing to deter­mine the “effect size” of each xreg vari­able, and have been set­ting 4 of them to their means whilst leav­ing 1 at is orig­i­nal, cal­cu­lat­ing pre­dicted val­ues, and doing this in turn for each xreg. Is this cor­rect (whilst it’s the usual way for lm, per­haps not for ts)? Even if it is cor­rect, I’ve no idea how to cal­cu­late 95% pre­dic­tion inter­vals for xreg effect sizes from an Arima. So I was won­der­ing whether to cal­cu­late lots of Ari­mas, in a ‘leave-​​k-​​out’ approach for all but a sub­set of the data, and fore­cast­ing val­ues for each small sub­set. As this will take a very long time to run on my slow PC I thought I’d bet­ter check with you first!
    Many thanks for any hints.

    Roy

    • http://robjhyndman.com Rob J Hyndman

      Your approach will give you esti­mated effect sizes. How­ever, I don’t know what you mean by pre­dic­tion inter­vals for effect sizes. A pre­dic­tion inter­val is for an unob­served ran­dom vari­able, whereas an effect size is not ran­dom. Per­haps you mean a con­fi­dence inter­val? In that case, the ci can be esti­mated as x times the ci for beta.

      • Roy Sander­son

        Dear Rob
        Many thanks for your reply — apolo­gies, I should have said 95% CI. What I’m after is plots sim­i­lar to those obtained via the all­Ef­fects func­tion, except for the xreg vari­ables used in an arima rather than a lm. Many thanks if you think the method I’ve sug­gested should be OK.
        Roy