Out-of-sample one-step forecasts

It is common to fit a model using training data, and then to evaluate its performance on a test data set. When the data are time series, it is useful to compute one-step forecasts on the test data. For some reason, this is much more commonly done by people trained in machine learning rather than statistics.

If you are using the forecast package in R, it is easily done with ETS and ARIMA models. For example:

library(forecast)
fit <- ets(trainingdata)
fit2 <- ets(testdata, model=fit)
onestep <- fitted(fit2)

Note that the second call to ets does not involve the model being re-estimated. Instead, the model obtained in the first call is applied to the test data in the second call. This works because fitted values are one-step forecasts in a time series model.

The same process works for ARIMA models when ets is replaced by Arima or auto.arima. Note that it does not work with the arima function from the stats package. One of the reasons I wrote Arima (in the forecast package) is to allow this sort of thing to be done.


Related Posts:


  • Ricardo Bessa

    Very useful function. One suggestion for future version of the ‘forecast’ package is to include a function to test multi-step ahead prediction with ARIMA and ETS. For instance, fit the model in a training dataset and then conduct multi-step ahead predictions iteratively in a test dataset.

  • Luis Juan

    Forecast is a very useful package. We use it a lot for classes and research. I want to compare the ARIMA models with exponential smoothing in a data set. (I’m using Arima and auto.arima). I have fitted a very complex ARIMA model (mod_A) to hourly data with n days. Now I would like to obtain the forecast erros for the next 20 days using the model. I want to keep the model fixed (mod_A). Obtain the 24 forecast for the day n+1. Then, with the same model and the updated time series up to n+1, i want to forecast the 24 hours corrresponding to next day n+2, and so on. How can I do this with Arima?

    • You will need to use a loop with the following commands within it.

      fit <- Arima(x, model=mod_A)
      fcast <- forecast(fit, h=24)
      e <- y – fcast$mean

      where x is the data up to the forecast origin, and y is the data for the next 24 hours. The first line applies model mod_A without re-estimating it.

      • Luis Juan

        Thank you for such a quick and effective response. It is great, thanks.

        • Rob Sir and Luis,
          I beg your pre-pardon for asking this(if it seems so obvious to you), but I got to ask this for my clear understanding. I am also dealing with similar situation as Luis had been dealing. In my case, I had to forecast for next 48 hours and keep the forecasting moving on, keeping the original model unchanged. In this context, I don’t get the notion of data ‘x'(in Rob Reply)? How can I relate this with the trainingdata and testdata as shown in his blog post? Can you please elaborate this? Luis, How did you deal with it?
          Thanks in advance.

  • Antonio

    Good post. Rob I’m trying to do something similar to Luis Juan (forecast n+1 and so on) but using bats and Arima. The Arima works pretty well although using the same commands but changing Arima for bats gets a lot of NaNs after aprox 2/3 of the out of sample data instead. Is there a different treatment for bats? if not, have you experienced anything similar? I’m running out of ideas here. Thanks

    • Antonio

      I’m sorry I wanted to say in-sample data. I have 8760 observations + 672 for out of sample. I get the NANs aprox. after observation number 6717.

      • Antonio

        In fact the values that I get before the NaNs are huge and make no sense at all but the fittted.values extracted from model_A bats are fine, so the error comes only when I do the step bats(x+y,model=mod_A) where y are new observations.

        • There is no model argument for bats(). So there is currently no way of getting forecasts on new data without re-estimating the model. Something for a future version.

          • Antonio

            Thanks indeed and congrats for your blog.

  • Pattana Lee

    Does this trick work with arfima() in the {forecast} package as well? Thank you.

    • No, but I’ll add it to the list of feature requests.

      • Pattana Lee

        Thank you very much.

  • sana

    kindly let me know about the accuracy measure. is there any way to get MASE for test dataset, while i have onestep forecasts,

    • Use the accuracy() command.

      • sana

        thanx . I found the way.
        One more thing, i am working with temperature data, it has negative values and of course no absolute zero. i want to compare the Arima, ets and splinef. i am working with one step forecasts. is this reasonable to compare with MAPE? is MASE a good measure to compare the accuracy in this case?

        • MAPE makes no sense with temperatures.
          Look at the code for accuracy to see precisely what it is doing.

          • sana

            thank a lot.

  • Yang-hui Chang

    Very useful function. I want to konw more about the diffreences between out-of-sample and in-sample. Is it true that ,if my sample has 500 observations, the out-of-sample estimate is below:arma1<-Arima(window(y, end=400))
    arima11 <- Arima(window(y,start=500),model=arma1)

    Or else.

    • You want to start the second window at 401.

      • Yang-hui Chang

        Thanks for your help! And I want to konw how to decide the matrixs of SR and LR in SVAR or VECM ? I am learning..

  • plat

    Is there a way to do out-of-sample one-step forecasts with an existing stlm model? I’m not sure how that would work, given that you’d have to seasonally decompose the new timeseries data before forecasting.

    • plat

      I have come up with the following strategy, any insight on if it’s correct?

      I use stlm on the training series and get the resulting STL object and ETS model. I use that model as input to the one-step ahead call to ets with the test training series, then I add to the mean the seasonal+trend components from the last season of the STL object. Basically the same as what STLM does internally.

      • OK except you will need to seasonally adjust the test data too. You could probably use the last year o the seasonal component of the STL object to do that. Also, you only add back the seasonal component, not the trend component.

        • plat

          Ok yes. The trend addition was a mistake. Thanks for the input Rob I appreciate it. Also forgot to do the invert BoxCox in case the stlm did one

  • John

    Hi Rob, thanks for the post. I would like to ask you about a modification for HAR-RV equation which looks like this:

    har0 = lm(GK~volatd+volatw+volatm+lagret0+lagsign)
    summary(har0)

    it is based on realised volatility and I need to do a forecast in order to compare similar models in Diebold-Mariano test (dm.test{forecast}). Do you have any idea how to forecast it and save its residuals in order to use it in the dm.test?

    Thanks, John

    • Andres

      Could you handle this? what if you have quantile reg? how the forecast code goes? Thanks a lot Rob