New in forecast 4.0

A few days ago I released version 4.0 of the forecast package for R. There were quite a few changes and new features, so I thought it deserved a new version number. I keep a list of changes in the Changelog for the package, but I doubt that many people look at it. So for the record, here are the most important changes to the forecast package made since v3.0 was released.

Exponential smoothing

The ets() function handles exponential smoothing in the ETS framework, but for teaching purposes it is sometimes useful to discuss a more traditional implementation that does not involve optimizing the initial values. The HoltWinters() function in the stats package attempts to do this, although the initialization used for the Holt and Holt-Winters methods is non-standard, making it unsuitable for teaching.

Consequently, I’ve now modified the ses(), holt() and hw() functions to allow both traditional heuristic initialization and ETS-style optimization of initial states. These functions also now allow damped trend and multiplicative trend.

Multiple seasonality and BATS and TBATS models

Time series with multiple seasonal patterns (e.g., hourly data that contains a daily pattern, weekly pattern and an annual pattern) now have their own model class msts. This is useful when plotting the data, and in using the dshw() double seasonal Holt-Winters function for example.

The BATS and TBATS models (fitted using bats() and tbats()) also handle multiple seasonalities as well as complex seasonalities (such as non-integer seasonality, non-nested seasonality and large period seasonality). The models are in the exponential-smoothing framework and were proposed by De Livera, Hyndman and Snyder (JASA 2011).

BATS and TBATS models allow a form of time series decomposition, and so the fitted objects can now plotted in the same way as other decomposition objects (e.g., stl objects).

These models are quite computationally intensive, and so the functions will use parallel processing when it is available.


This is probably the most popular function in the package, and I am always looking at ways to speed it up and improve the results.

  • Default model selection has been switched from AIC to AICc (the bias-corrected form of AIC) which may affect model selection for very short time series.
  • The maximum orders are now restricted to less than 1/3 of the length of the data.
  • Parallel processing is used where possible when stepwise=FALSE.

Accuracy calculations

The accuracy function now works with both time series forecasts and cross-sectional forecasts.

The MASE for seasonal time series is now defined using seasonal naive forecasts for the in-sample scaling. The MASE for cross-sectional forecasts is defined using the mean forecasts for in-sample scaling. This is consistent with my new book.

Neural network AR models

A new experimental function nnetar has been introduced to fit a neural network with lagged values of the time series as inputs. It is a feed-forward network with one hidden layer, specified using the notation NNAR($p,k$) to indicate there are $p$ lagged inputs and $k$ nodes in the hidden layer. For example, a NNAR(9,5) model is a neural network with the last nine observations $(y_{t-1},y_{t-2},\dots,y_{t-9}$) used as inputs to forecast the output $y_t$, and with five neurons in the hidden layer. A NNAR($p,0$) model is equivalent to an ARIMA($p,0,0$) model but without the restrictions on the parameters to ensure stationarity.

With seasonal data, it is useful to also add the last observed values from the same season as inputs. For example, an NNAR(3,1,2)$_{12}$ model has inputs $y_{t-1}$, $y_{t-2}$, $y_{t-3}$ and $y_{t-12}$, and two neurons in the hidden layer. More generally, an NNAR($p,P,k$)$_m$ model has inputs $(y_{t-1},y_{t-2},\dots,y_{t-p},y_{t-m},y_{t-2m},y_{t-Pm})$ and $k$ neurons in the hidden layer. A NNAR($p,P,0$)$_m$ model is equivalent to an ARIMA($p,0,0$)($P,0,0$)$_m$ model but without the restrictions on the parameters to ensure stationarity.

The nnetar() function fits an NNAR($p,P,k$)$_m$ model. If the values of $p$ and $P$ are not specified, they are automatically selected. For non-seasonal time series, the default is the optimal number of lags (according to the AIC) for a linear AR($p$) model. For seasonal time series, the default values are $P=1$ and $p$ is chosen from the optimal linear model fitted to the seasonally adjusted data. If $k$ is not specified, it is set to $k=(p+P+1)/2$ (rounded to the nearest integer).

By default, 25 networks with random starting values are trained and their predictions averaged.

Like many of the functions in the forecast package, the default arguments usually give reasonably good forecasts without the user needing to know much. For example:

fit <- nnetar(lynx)

Because random starting values are used, the forecasts will be slightly different each time the function is called.

It is possibly to compute prediction intervals from NNAR models via simulation, but I have not yet implemented this. The function will only give point forecasts for now.

New default colors for forecast plots

The old yellow and orange colors have been replaced by shades of blue-grey which are a little easier on the eye. The darker the color, the higher the density. Here is an example with 80% and 95% prediction intervals.

fit <- Arima(euretail, order=c(0,1,3), seasonal=c(0,1,1))

Future development

Earlier this year I started using the package for an undergraduate forecasting class which gave it a good work-out, and I’m also using it as the basis of my new book with George Athanasopoulos. As a result, a lot of new functionality has been added, and lots of bugs have been discovered and fixed.

The process is continuing and I have lots of ideas I want to implement, and no doubt more bugs will be discovered. The development of the forecast package is now maintained on github where you can see what is in development and report any new problems. Please let me know if you find anything you think is an error, or if you have a suggestion.

Related Posts:

  • fernando

    It would be nice to have an automatic cross-validation function, given the fitted model and the validation parameters (training-set size, max Horizon, etc). Anyway, great job in the new version!

  • Samik

    Hi Rob, thanks for the notification. Indeed, the ‘What’s new’ section rarely gets read. Anways, I was wondering if you have an updated version of the original paper which detailed the auto.arima() algorithm, that includes the improvements and changes?

  • Rob, congrats for the development. I like this package a lot. By the way, whenever a new version is available, the first thing I look for is the changelog/what’s new section!

  • Shea Parkes

    Thanks for all your hard work Rob! I appreciate the subtler default colors, but you’ll need to go back and update your online book to stop mentioning yellow/orange now:

  • Jerome

    Thank you Rob for the function dshw. When I get the point of forecast, how could I get the forecast interval (low 0.80 and upper 0.80)? Thank you

    • I have not implemented prediction intervals for dshw(). Use bats() to get PI for an equivalent model.

  • BMiner

    Hey Rob, You mention prediction intervals with ANN via simulation. Do you have any sources to reference?

    • No, but a neural network is essentially a nonlinear autoregression and simulation is the most common way to get PI for other nonlinear autoregression models.

      • Bminer

        Do you think bootstrapping would be a viable method (using the boot package and tsboot)? I am thinking of bootstrapping the series and then using each of the n models created to predict a new case – and thus getting a prediction distribution.

        • No. The bootstrapped series won’t allow prediction conditional on the last few values of the observed series.

  • Joseph Johnson

    Dr. Hyndman,

    Does the ets function in forecast package v4.0 constrain the parameters to be;
    0<α<1, 0<β<α, 0<γ<1-α ?
    Would you recommend this constraint for all models?
    Does the package go even further to constraint the certain models as mentioned in Chapter 10 of your exponential smoothing book?

    Thank you for the package. It has been very helpful to me!

    • Try reading the help file. It is all explained. The constraints you mention are implemented using bounds=”usual”. The constraints in ch10 of our book are implemented using bounds=”admissible”. The default is bounds=”both” meaning both sets of constraints are applied.

    • Joseph Johnson

      Thank you.

  • Francesco Gianferrari Pini

    Hi! How can I extract the time series for the seasonal component(s) on a TBATS forecast? I can see them in the plot!

    • Look at plot.tbats() and you will see how they are constructed.

  • Jerome

    Bonjour Rob! I am running a bats function and I get this error:

    Error in checkForRemoteErrors(val) : 6 nodes produced errors; first error: argument is of length zero
    I have no clue to solve it. Thank you in advance for your tips.

    • Try using debug() to track where the error is occurring.

  • SimonMN

    Hi Prof Hyndman,

    Is there any way to add exogenous inputs to the nnetar function in the same way you can add exogenous variables to the Arima or auto.arima functions?

    • No, but it isn’t hard to do it using the avNNet function from the caret package. Basically nnetar() is forecasting using an average of 20 neural nets where the inputs are lagged values of the time series. You would just need to add the xreg variables as inputs as well.

      • llr

        Hi Prof Hyndman,
        Can you point to an example of this technique? In your textbook, you show an example of avNNet() on a cross-sectional dataset, which is helpful. To implement avNNet() on a time series dataset, would you create a dataframe that holds not only dummy variables for holiday/seasonal effects but also the lagged variables? Eg something like avNNet(point_forecast ~ lagged_1 + lagged_2 + lagged_3 + holiday_1) where lagged_1,..,lagged_3 are the lagged variables for that daily forecast? (And thus not use nnetar() at all?)

  • Christian Miller

    I don’t have anything substantive to add, but I want to give a huge thanks to you Rob. This has been a helpful and useful package in learning about forecasting and implementing ideas. And you continue to add functionality faster than I can keep up: it’s a great problem to have.

    Greatly appreciated (and the book too!)

  • Tuan

    Hi Rob,

    is there any roadmap for computing the prediction intervals of NNETAR models? For some of our data (machine and network activity monitoring), the NNETAR models seem to render the best result.

    Thank you so much for your great work.

    • I wasn’t planning on adding them any time soon. It requires simulation of future sample paths. If someone was to write the code, I would add it to the package!

      • Tuan

        Thank you for much for the prompt response. I have reviewed several approaches to to calculate the prediction-intervals and seems to like the approach based on Gheyas and Smith’s papers the most. ( and ). Basically, they use two NN, the first one is for forecasting conditional mean (point forecast), the second one is for forecasting conditional variance. The input of the second NN is the residual series. The prediction intervals can be calculated based on the conditional mean and conditional variance. What do you think about Gheyas and Smith’s approach?

        • I’m unfamiliar with those papers, but any approach based only on means and variances will need to assume that the forecast distributions are normally distributed which is unlikely. A simulation approach does not make that assumption.

  • Peter

    There seems to be a bug in the bats() function.


    This command gives a warning:

    optim() did not converge.

    The command works without use.damped.trend=FALSE.


    • Peter

      I am trying to use bats() in place of dshw().

    • What makes you think it is a bug?

      • Peter

        Because it should work without the damped trend. That’s my opinion. I may be wrong.

        • It does work with the damped trend. It’s only a warning, and it only says that the optimization has not yet converged.

      • Peter

        dshw() produces forecasts without warnings for the same data.

  • Leo

    Congratulations on creating a wonderful forecast package. I have electricity demand data containing two seasonalities. When I do tbats(elec), I get just one seasonal period parameters. If I give both seasonal periods as input then I get parameters for both. But in the later case, I am not using full power of the tbats model. Can you please say something in this regard.

    • I don’t know what you mean by not using the full power of the tbats model.

      • Leo

        I mean let tbats decide using the criteria mentioned in the article.

        • tbats doesn’t decide on the seasonal periods. It chooses the number of Fourier terms to use for each of the seasonal periods.

          • Leo

            Then how can it find the complex seasonality not visible in the plot. I remember something similar mentioned in the article.

          • Peter

            You have an option of entering complex seasonal periods. TBATS is not going to find them for you. TBATS only finds the parameters. There are no other functions in R that allows this.

  • bayes40

    Hi Rob:
    Does the forecast package you created handle rational distributed lag models? I am developing hourly transfer functions that attempt to predict demand for every hour of the day using 1 1/2 years of daily demand data with 1 1/2 years of input advertising impression data. I am also controlling for seasonal variables including weekly base level demand patterns. I am assuming ARMA errors for the error terms, and I believe that a Koyck model is all I need to capture the influence of past advertising on demand (request for quotes). I want to use Auto.Arima, but not sure I am supposed to use it with for distributive lag specifications.

    • No, although it does handle lagged covariates via the xreg argument in the Arima() and auto.arima() function. It does not handle a Koyck model, or any other model with rational lag structure for a covariate. The TSA package will fit the model, but not produce forecasts. I hope to implement these models some time, but haven’t done so yet.

    • bayes40

      Thank you so much for the quick response. Love your text book

      • bayes40

        Last question: if you were building a model of advertisings effect on demand could you build an valid model using your auto arima and forecast package by adding lagged predictors or is there some added value in using the TSA packages?

  • Tuan

    Hi Rob,

    we are using your package ‘forecast’ to forecast millions of time series on hourly basis and would like to explore the option to use the incremental update of of the tbats models to speed up the processing (i.e. given a tbats model at the time point Tn, we would like to update the model when the data point for Tn+1 arrives). I understand that this feature is currently not available, and we will have to look at the source code and modify it. Before we begin, can you please advise if it’s feasible? If yes, please give us some hints where to modify the logic. Definitely, it’s working for us, we would love to contribute back to the package.

    Thanks a lot for your time

    • If you want to re-estimate the parameters, then it is not really feasible as the parameter optimization is too slow. But if you want to keep the current parameters, but compute the new forecasts on the basis of the new data, then it should be possible. That is on my list of proposed features to add to the package, but it probably won’t happen in the near future. You will need to skip the parameter optimization and model selection part of tbats, but still compute fitted values and residuals.

      • Tuan

        Thank you so much Rob for the response.

  • Padarn

    Hi Rob. I was going to raise this on github, but it seemed to minor for an ‘issue’.

    Is there any reason not to have a way of specifying automatic selection of the box-cox transformation parameter (and potentially also a shift)?

    • I’ve not come across an automatic method for the Box-Cox transformation that is reliable enough to make it the default. You can use BoxCox.lambda() which is not bad, but I get better forecasts on average without it.

  • mohammed jarad

    My name is mohammed jarad,I’m student in applied statistical ,
    I have do my research about time series ,neural networks.
    I have problem to use code of R about this code (nnetar)i don’t know haw it is
    use .
    can you help me >
    to forecast using this code(nnetar) .thinks you

    • Read the help file.

      • mohammed jarad

        Thank you..

      • mohammed jarad

        Hi Rob, I’m do my code for my data .
        ff<- nnetar(zz, P=1, size=10, repeats=20, lambda=0.05)

        but i have problem that i have not find any AIC or BIC .
        I like to tell you what i do?
        I find to (comparison between Box-Jenkins methodology and Neural Networks to predict) I'm finish to predict by Box-Jenkins but I have problem to find predict by Neural Networks ..

        can you help my to comparison between them to my data..
        thank you so much Rob.

        • Neural nets are not based on likelihood estimation, so there cannot be an AIC or BIC. Use a rolling hold out sample for comparisons instead.

          • mohammed jarad

            Thank you Rob,
            I’m going to do it.

  • Matt Weller

    I’m finding an error when passing an lm model to the accuracy function. In R I can’t seem to access (to step through) the functions forecasterrors(f, x, test) and getResponse(f). Is this code available and where would I be able to find it? If this debugging doesn’t solve it I will prepare a question for stackoverflow. The error I’m getting kicks in at getResponse(f) as follows:

    Error in `[.data.frame`(model.frame(object$model), , responsevar) : undefined columns selected

    I can see that the lm formula and data.table contain only the relevant columns correctly named and I need to dig further into the function which is defined in R as:

    function (object, …)

    Many thanks for the forecast package, it is awesome and I’ve only scratched the surface at this point.

    • Can you please indicate what version of the package you are using, and provide a minimal example so I can replicate the problem.

  • mohammed jarad

    Hi Rob, Can you please sent me any think help me on my research
    about (nnetar) I don’t find any think to help my please Rob .I’m going to find
    forecasting by (nnetar) to comparative neural networks with Box-Jenkens

      • mohammed jarad

        Thank you Rob.

      • mohammed jarad

        Hello Prof. Hyndman, i do my code. I’m generating an Arima model in time series. Let’s suppose I have n observations. The predict.Arima function from forecast package just make predictions for n + 1 observation on.

        I need to make a prediction for the n value

        (I want to predict the real data)

        • fitted() will give one-step forecasts of the data used to estimate the model.

  • Isabella Ghement

    Dear Rob,

    When one uses bootstrapping to derive forecast intervals, the forecast.Arima() function in the forecast package seems to report prediction intervals computed as follows:

    qq <- qnorm(0.5 * (1 + level[i]/100))
    lower[, i] <- pred$pred – qq * pred$se
    upper[, i] <- pred$pred + qq * pred$se

    In the above, pred$pred and pred$se represent the mean and standard deviation of the point forecasts derived via bootstrapping.

    Is there any reason why the normal distribution quantiles are used in the derivation of these forecast intervals instead of the quantiles corresponding to the bootstrap distribution of the point forecasts?

    Also, if the bootstrap distribution of the point forecasts is non-normal (e.g., multi-modal), the standard deviation of this distribution may not be a sensible measure of spread. Similarly, the mean of this distribution may not be a sensible measure of centre.

    Thank you very much for any clarifications you can provide.

    • Yes, the quantiles would be better. I’ll add it to the list of things to do.

      • Isabella Ghement

        Thanks so much, Rob! This sounds reasonable. It would also be helpful to the users to have access to the bootstrap values of the point forecasts, so they can examine their distribution and assess how close to or far from normality this distribution actually is.

        The other thing I was wondering is as follows:

        If the bootstrap distribution of the point forecasts looks really “wild” (e.g., several peaks, some gaps, asymmetry, etc.), does it still make sense to report the default point forecast produced by forecast.Arima() (which is in essence the “mean” value of the bootstrap distribution of the point forecasts)? Or perhaps it would be better to report instead the “median” value of the bootstrap distribution of the point forecasts, along with the quantile-based forecast interval?

        • You can already get the bootstrap values via the simulate.Arima function.

          If the distribution of future values looks wild, then you should report the whole distribution, not just a point estimate or an interval estimate.

  • Pingback: MT4 to R interface ...()

  • iris1873

    It seems that the nnetar forecasts differ a lot with different sample size of data we use to build model. If the goal is to improve forecast accuracy, is there a way to determin the best sample size to build model? Thanks!

  • René Traue

    First of all: Thank you for all the work invested into the forecast-package. I am currently using it for my master thesis about sales forecasting. There is one question I unfortunately couldn’t answer by your book, blog, the R-documentation or searching in cross-validated, etc.: Which function is used for modification in the hidden-layer of nnetar? Do you use the nonlinear sigmoid-function you mentioned in the NN-introduction of your book or did you select something different? Thanks a lot in advance!

  • René

    First of all: Thank you for all the work invested into the forecast-package. I am currently using it for my master thesis about sales forecasting. There is one question I unfortunately couldn’t answer by your book, blog, the R-documentation or searching in cross-validated, etc.: Which function is used for modification in the hidden-layer of nnetar? Do you use the nonlinear sigmoid-function you mentioned in the NN-introduction of your book or did you select something different? Thanks a lot in advance!