Comparing HoltWinters() and ets()

I received this email today:

I have a question about the ets() function in R, which I am trying to use for Holt-Winters exponential smoothing.
My problem is that I am getting very different estimates of the alpha, beta and gamma parameters using ets() compared to HoltWinters(), and I can’t figure out why.

This is a common question, so I thought the answer might be of sufficient interest to post here.

There are several issues involved.

  1. HoltWinters() and ets() are optimizing different criterion. HoltWinters() is using heuristic values for the initial states and then estimating the smoothing parameters by optimizing the MSE. ets() is estimating both the initial states and smoothing parameters by optimizing the likelihood function (which is only equivalent to optimizing the MSE for the linear additive models).
  2. The two functions use different optimization routines and different starting values. That wouldn’t matter if the surfaces being optimized were smooth, but they are not. Because the MSE and likelihood surfaces are both fairly bumpy, it is easy to find a local optimum. The only way to avoid this problem is to use a much slower computational method such as PSO.
  3. ets() searches over a restricted parameter space to ensure the resulting model is forecastable. HoltWinters() ignores this issue (it was written before the problem was even discovered). See this paper for details (equivalently chapter 10 of my exponential smoothing book).

I have experimented with many different choices of the starting values for the initial values and smoothing parameters, and what is implemented in ets() seems about as good as is possible without using a much slower optimization routine. Where there is a difference between ets() and HoltWinters(), the results from ets() are usually more reliable.

A related question on estimation of ARIMA models was discussed at

Related Posts:

  • Leo

    Thanks for the comparison Dr. Hyndman. I have another issue related to Holt Winters. While using hourly data, with weekly seasonality, the frequency=168. I guess ets() fails to handle this while HoltWinters works.

  • Leo

    Dr. Hyndman,
    According to your book, its not possible to use ets(AAM) for a Holt-Winters model with additive trend and multiplicative seasonality. How about using ets(MAM) if we are interested in point forecasts only.

    • ETS(M,A,M) is fine for point forecasts and prediction intervals. ETS(A,A,M) is numerically unstable with infinite prediction intervals.

  • James

    What would be the best way to objectively compare the performance of the HoltWinters and ets functions in R? It seems HoltWinters returns a value containing SSE (sum of the squared errors) whereas ets returns a value containing loglik as a measure of accuracy… which makes it difficult to compare the two (apples and oranges).

    I know in your book you recommend using ets over HoltWinters but HoltWinters seems to be generating much more credible forecasts for some sample data that I have (just looking at a plot) and I wanted to verify this using some objective measure of the fitting algorithms.

    • James

      Argh… as usual, I should look with my eyes and not with my mouth. I see the forecast package contains an accuracy() method that does exactly what I wanted…

      Sorry for the bother – and thank you so much for the fantastic work both on the R packages and your text books.

      • It is, of course, possible that HoltWinters will give better forecasts for a specific time series. But on average, ets will be better as it optimizes the initial states, it provides a larger model class, and it allows model selection via AIC.

  • Pingback: R for SQListas (2): Forecasting the Future | recurrent null()

  • Pingback: R for SQListas (2): Forecasting the Future – Cloud Data Architect()

  • Jason

    Hello Professor – any chance the option to use PSO will be added to the ETS function in the forecast package? Curious how much of a difference it would make and if it is worth the time cost.

    • Maybe. It could be useful as the likelihood is fairly bumpy and the current optimization method sometimes ends up at local optima. The downside is that it is much slower.