Batch forecasting in R

I sometimes get asked about forecasting many time series automatically. Here is a recent email, for example:

I have looked but cannot find any info on generating forecasts on multiple data sets in sequence. I have been using analysis services for sql server to generate fitted time series but it is too much of a black box (or I don’t know enough to tweak/manage the inputs). In short, what package should I research that will allow me to load data, generate a forecast (presumably best fit), export the forecast then repeat for a few thousand items. I have read that R does not like ‘loops’ but not sure if the current cpu power offsets that or not. Any guidance would be greatly appreciated. Thank you!!

My response

Loops are fine in R. The are frowned upon because people use them inappropriately when there are often much more efficient vectorized versions available. But for this task, a loop is the only approach.

Reading data and exporting forecasts is standard R and does not require any additional packages to load. To generate the forecasts, use the forecast package. Either the ets() function or the auto.arima() function depending on what type of data you are modelling. If it’s high frequency data (frequency greater than 24) than you would need the tbats() function but that is very slow.

Some sample code

In the following example, there are many columns of monthly data in a csv file with the first column containing the month of observation (beginning with April 1982). Forecasts have been generated by applying forecast() directly to each time series. That will select an ETS model using the AIC, estimate the parameters, and generate forecasts. Although it returns prediction intervals, in the following code, I’ve simply extracted the point forecasts (named mean in the returned forecast object because they are usually the mean of the forecast distribution).

retail <- read.csv("",header=FALSE)
retail <- ts(retail[,-1],f=12,s=1982+3/12)
ns <- ncol(retail)
h <- 24
fcast <- matrix(NA,nrow=h,ncol=ns)
for(i in 1:ns)
  fcast[,i] <- forecast(retail[,i],h=h)$mean

Note that the transpose of the fcast matrix is used in write() because the file is written row-by-row rather than column-by-column.

This code does not actually do what the questioner asked as I am writing all forecasts at once rather than exporting them at each iteration. The latter is much less efficient.

If ns is large, this could probably be more efficiently coded using the parallel package.

Related Posts:

  • FD

    Thanks for the post. Rather than for loops I collect my time series into lists and then use plyr::llply, with a custom function that extracts what I need. In the case above, where the series are all of equal length, you could use plyr::alply. I’ve also been playing around with using the ‘Characteristic Based Clustering Approach’ (from your paper with Wang) and then making forecasts based on a model fit on a cluster, rather than each individual series.

  • Han

    Another possibility is to use the doMC or foreach package. First of all, write a function to obtain forecasts for one sample. Then, loop the function for all replications and results are stored in a list normally. In doing so, you utilize multi-core features in standard Mac or PC.

  • Mike P

    Looking at the retail data after reading in the csv file it appears that it is daily, yet when creating the ts object you define the frequency=12. Wouldn’t this impede your ability to correctly use many of ts and forecast methods (i.e. cycle, fourier, etc)? What is the correct ts frequency setting for daily data? And is there a way to keep the actual starting date (i.e. in this case 1/04/1982) instead of using the generic form of s=1982?

    • Rob J Hyndman

      It is monthly data. The date column gives the first day in each month.

  • Joseph J

    Thank you for the example. Very helpful.

  • Herimanitra

    Hi Prof. Hyndman!

    have you ever tried to blend a set of ARIMA models? Is it more efficient than choosing the “best” ARIMA?


    • Rob J Hyndman

      No, I haven’t tried that. In general, combining forecasts from different models works best when the models are different. So combining different ARIMA models may not be so effective.

      • Herimanitra


  • Ramesh

    Dear Prof.

    I am a research scholar from India. Current working on Time series modeling of R.

    I want to forecast multiple data series in R .

    the data sheet having multiple column fromT1 to T48.

    Please advice me how can I select a best forecasting method and also forecast all the columns at a time.

    Your’s Sincerely

  • Mila

    Hi! First of all, THANKS a lot for your hard/good work, it makes our life easier :) Now, I also want to automatically forecast let’s say a thousand time series, but as they might have significant pattern differences, I apply many of the methods in the forecast package (arfima, auto.arima, ets, with and without Box Cox transf., tbats, etc.), then I calculate their accuracies and choose the one with minimum MASE for each time series; of course, there are some cases that I have to analize separately, but for the large amount of data I have I thought this would be a reasonable way of doing ‘automatic’ forecasting… Am I on the right path, or do you have any suggestions…? Thanks in advance for your time/reply. Kind regards, Mila.

    • Rob J Hyndman

      You can’t select a model based on in-sample accuracy. Use out-of-sample accuracy instead. Better still, a rolling forecasting origin as explained at

      • Mila

        Thanks a lot!

  • AY

    I’m not sure if this is the right place to post this; I have a different, but related problem. Given 50k individual time series representing revenue per customer, I would like to forecast the average (or total) revenue for this group, together with a prediction interval. I thought of a few approaches:

    1. Forecast each time series and then aggregate. (user-level)

    2. Aggregate the data first, and then forecast the one aggregated time series. (group-level)

    3. Forecast at the user-level, but cluster the data first, and share parameters across clusters. (hybrid)

    For my first pass at the problem, I tried the user-level approach. To obtain the prediction interval for the mean, would it be reasonable to average the individual prediction intervals (assuming the individual time series are independent) ?

    It seems like the hybrid approach should yield the best predictions. Could you point me to any related papers in this regard?



  • Amateur TS modeler

    Professor, When you say ns is large(to use the parallel package) what is your ball park number??
    I have 100 variables and 700 date points in each. Should the above method work or would I have to use the parallel package?

    • Rob J Hyndman

      Experiment. See what works best.

  • Dan

    Professor, is there a way to get it to output the model selected/parameters?

  • RK

    This is helpful, I am wondering however if the time series data are stacked instead of separated by columns. So, if I have three columns product, date, and volume and I’d like to forecast each product inddividually, can the forecast package separate each time series… similar to by variable in SAS.

    • Rob J Hyndman

      It is easy to reformat such data into the format I’ve used above. Or to modify my code to use the 3-column matrix directly.

  • Joan Arus

    Excuse my newbie question but.. could you please copy the sintax of that last paragraph of R code? Written it like this doesn’t seem to work in my R Studio. Thanks!

    for(i in 1:ns)
    fcast[,i] <- forecast(retail[,i],h=h)$mean


    • Joan Arus

      sorry! already solved.

  • Thisara Watawana

    How to measure the forecast accuracy for each time series?

  • Harshad Madhamshettiwar

    On the grounds as the question is– How do we decompose retail dataset using the decompose() function and be able to save all the output and metrics for all columns. This is real life requirement for me. I am trying to decompose 16 products, all those have same periodicity and can be saved as time series. But somehow dont think it is working or I think decompose() function needs all products to be stored as individual ts object and passed in a loop.

    • Rob J Hyndman

      Use a list

      • Harshad Madhamshettiwar

        Thanks you for the reply. Yes I used the list and it works as desired.