Cross-validation for time series

I’ve added a couple of new functions to the forecast package for R which implement two types of cross-validation for time series.

K-fold cross-validation for autoregression

The first is regular k-fold cross-validation for autoregressive models. Although cross-validation is sometimes not valid for time series models, it does work for autoregressions, which includes many machine learning approaches to time series. The theoretical background is provided in Bergmeir, Hyndman and Koo (2015). So cross-validation can be applied to any model where the predictors are lagged values of the response variable.

This is implemented for NNAR models (neural network autoregressions) in R as follows:

modelcv <- CVar(lynx, k=5, lambda=0.15)

The output is a summary of the accuracy across folds:

5-fold cross-validation
                  Mean          SD
ME        -32.88142801  98.0725227
RMSE      931.90966858 352.8705338
MAE       608.99488205 272.1244879
MPE       -17.84710226  15.2700638
MAPE       53.99760978  12.7264054
ACF1        0.04842174   0.1480883
Theil's U   0.82984737   0.1487229

The CVar function is rather limited at this stage, and will only handle cross-validation for models computed using nnetar. If there is enough interest, I might add more functionality at a later stage.

Time series cross-validation

In this procedure, there is a series of test sets, each consisting of a single observation. The corresponding training set consists only of observations that occurred prior to the observation that forms the test set. Thus, no future observations can be used in constructing the forecast. The following diagram illustrates the series of training and test sets, where the blue observations form the training sets, and the red observations form the test sets.


The forecast accuracy is computed by averaging over the test sets. This procedure is sometimes known as “evaluation on a rolling forecasting origin” because the “origin” at which the forecast is based rolls forward in time.

With time series forecasting, one-step forecasts may not be as relevant as multi-step forecasts. In this case, the cross-validation procedure based on a rolling forecasting origin can be modified to allow multi-step errors to be used. Suppose that we are interested in models that produce good 4-step-ahead forecasts. Then the corresponding diagram is shown below.


Time series cross validation is implemented with the tsCV function. In the following example, we compare the residual RMSE with the RMSE obtained via time series cross-validation.

e <- tsCV(dj, rwf, drift=TRUE, h=1)
sqrt(mean(e^2, na.rm=TRUE))
## [1] 22.68249
sqrt(mean(residuals(rwf(dj, drift=TRUE))^2, na.rm=TRUE))
## [1] 22.49681

Here I apply a random walk with drift to the Dow-Jones index time series dj. The first calculation implements a one-step time series cross-validation where the drift parameter is re-estimated at every forecast origin. The second calculation estimates the drift parameter once for the whole data set, and then computes the RMSE from the one-step forecasts. As expected, the RMSE from the residuals is smaller, as the corresponding “forecasts” are based on a model fitted to the entire data set, rather than being true forecasts.

The tsCV function is very general, and will work for any forecasting function that returns an object of class forecast. You don’t even have to specify the minimum sample size for model fitting, as it will silently fit models beginning with a single observation, and return a missing value whenever the model cannot be estimated.

These functions are only on the github version for now, but will migrate to CRAN next time I release a new stable version.

Related Posts:

  • juan

    Hello. How did you create that plots?

    • Just some R code:

      i <- 1
      for(j in 1:20)
      test <- (6+j):26
      train = i)
      points(test[i], 1-j/20, pch=19, col=”red”)
      if(length(test) >= i)
      points(test[-i], rep(1-j/20,length(test)-1), pch=19, col=”gray”)
      points(test, rep(1-j/20,length(test)), pch=19, col=”gray”)

  • Jason

    Hi Professor – Is there any research into if averaging the folds is the best method for summarizing the out-of-sample accuracy (i.e. would using trimmed means (depending on # of folds) or median work?)? Do we need to worry about trends in the OOS accuracy folds (I’m assuming, since the folds are time dependent, that this isn’t an issue for non-TS CV)?

    • 1. “Best” depends on your optimization criterion. Normally MSE is used because we are wanting to minimize MSE. But if you had some other objective, you would need to adapt your accuracy measure accordingly.
      2. Yes, it is worth looking at the stationarity properties of the TSCV errors. e.g., plot them over time and see if there are heteroskedastic patterns.

  • Ross

    Hi Professor
    How to run both of the function “CVar” and “tsCV”?

  • Tetsuya

    Dear Prof. Hyndman,
    I tried ets to some time series and met one seasonal monthly time series whose 12 months ahead forecasts are just the same as those results from s_naive method. Could you please explain why this happened? Thank you!

    • snaive is a special case of an ETS model.

      • Tetsuya

        Thank you so much.
        I tried another time series but its ets 12 months ahead forecasts are not same as forecasts of snaive method. Could you please tell me in what conditions this happens?

  • Andrzej

    When I run the following code, exactly the same you present here:
    e <- tsCV(dj, rwf, drift=TRUE, h=1)
    with a function tsCV taken from you GitHub repository, as a result i get a Time-Series with all NA's. Do you have any idea why it might be the case?

    • That’s not what I get. Do you have a dj object full of missing values?

      • Andrzej

        Thank you for such prompt reply.
        I found a cause of such behave. I only copied tsSV function from you GitHub, but then I realized that within this function there is also subset.ts function used of different form than in your recent forecast package release (there is no support for start and end parameters). After copying new subset.ts function it worked well (I should probably learn to use GitHub properly).
        I have a question though – you wrote that function tsCV is ‘very general, and will work for any forecasting function that returns an object of class forecast’. Does it imply that it won’t work for functions returning object ARIMA, like function Arima or auto.arima? Because I was mainly interested in doing cross-validation for those models.

        • Install the package from github. There is an example in the help file for tsCV.

  • Christian Silva

    Hi, what is the best way of citing time series cross-validation? Do we cite this post or do you have the same material in an article? Thank you!

  • Xiaokun Xu

    Thanks for the post and the working paper. I’m bit confused about the diagram above in the blog, though. In the paper, the formulation of a AR(p) model assumed a fixed input data dimension p, i.e. using p blue dots to predict one red dots in each instance. But the diagram above has a variable length input dimension. Could you please clarify?

    • The paper is about the CVar function. The diagram is about the tsCV function.

      • Xiaokun Xu

        I see. If I understood it correctly, the time series CV leaves all future value out, while using the past to train a model. Unlike typical machine learning models that require fixed dimension input, time series model could be trained with flexible length of data. The “rolling” origin doesn’t mean that the first blue dots is rolling together with the red dots. Rather it is “rolling” backward with reference to the prediction target.

  • Laura Suttle

    I’m a data scientist moving into time series and find the way you write about forecasting to be the most easily digestible, so I hope you won’t mind an unrelated question on this blog post: Do you have any good resources for how to deal with mismatched time series intervals? I’m trying to predict quarterly from monthly data, but the resource I find don’t talk about very basic things, like how to structure data like that. I’ve been just averaging the monthly to quarterly, but I’d really like to not lose the information from the monthly data.

  • Mayur Hampiholi

    What would it take to implement CVar for arima,tbats and other models ? I am looking to implement cross validation. Are we better off borrowing from this link( to implement effective cross validation between models?
    Follow-up question, in a hierarchical time series(assuming computational time was of the essence), is it feasible to get away with performing CV for only the top and mid levels(because of the information gain at these levels is more important sometimes)?

    • Cross-validation only works when you can write the model as an autoregression. So that rules out general ARIMA, TBATS, ETS, and so on. You should use tsCV instead which works for any time series model.

      • Mayur Hampiholi

        Thanks, big fan of your work!

  • germ

    Hi Prof,

    can i double check, is the tsCV() function similar to what you wrote for:

    I managed to run the function on my own data set. But I am confused how am i suppose to make use of the ouput. Am i suppose to compare the RMSE with the residual’s RMSE? How do we use tsCV() to compare between different models for the same data set
    Also, for tsCV(), how many iteration does it run i.e k=?

    • It is open source. Look at the function yourself.

  • Christian Ramirez Luna

    Dear professor,
    Following your first graph, what would you recommend to do if my last point of demand is the last blue dot of the first row and I have to do forecast for “x” days in advanced using history just up to the last blue dot? each day to be forecasted in advanced may need a different step