A blog by Rob J Hyndman 

Twitter Gplus RSS

Initializing the Holt-​​Winters method

Published on 30 November 2010

The Holt-​​Winters method is a pop­u­lar and effec­tive approach to fore­cast­ing sea­sonal time series. But dif­fer­ent imple­men­ta­tions will give dif­fer­ent fore­casts, depend­ing on how the method is ini­tial­ized and how the smooth­ing para­me­ters are selected. In this post I will dis­cuss var­i­ous ini­tial­iza­tion methods.

Sup­pose the time series is denoted by y_1,\dots,y_n and the sea­sonal period is m (e.g., m=12 for monthly data). Let \hat{y}_{t+h|t} be the h–step fore­cast made using data to time t. Then the addi­tive for­mu­la­tion of Holt-​​Winters’ method is given by the fol­low­ing equations

    \begin{align*} \ell_t & = \alpha(y_t - s_{t-m}) + (1-\alpha)(\ell_{t-1}+b_{t-1})\\ b_t & = \beta(\ell_t - \ell_{t-1}) + (1-\beta)b_{t-1}\\ s_t &= \gamma(y_t-\ell_{t-1} - b_{t-1}) + (1-\gamma)s_{t-m}\\ \hat{y}_{t+h|t} &= \ell_t + b_th + s_{t+h-m}, \end{align*}

and the mul­ti­plica­tive ver­sion is given by

    \begin{align*} \ell_t & = \alpha(y_t/ s_{t-m}) + (1-\alpha)(\ell_{t-1}+b_{t-1})\\ b_t & = \beta(\ell_t - \ell_{t-1}) + (1-\beta)b_{t-1}\\ s_t &= \gamma(y_t/(\ell_{t-1} + b_{t-1})) + (1-\gamma)s_{t-m}\\ \hat{y}_{t+h|t} &= (\ell_t + b_th)s_{t+h-m}. \end{align*}

In many books, the sea­sonal equa­tion (with s_t on the LHS) is slightly dif­fer­ent from these, but I pre­fer the ver­sion above because it makes it eas­ier to write the sys­tem in state space form. In prac­tice, the mod­i­fied form makes very lit­tle dif­fer­ence to the forecasts.

In my 1998 text­book, the fol­low­ing ini­tial­iza­tion was pro­posed. Set

    \begin{align*} \ell_m & = (y_1+\cdots+y_m)/m\\ b_m &= \left[(y_{m+1}+y_{m+2}+\cdots+y_{m+m})-(y_1+y_2+\cdots+y_{m})\right]/m^2. \end{align*}

The level is obvi­ously the aver­age of the first year of data. The slope is set to be the aver­age of the slopes for each period in the first two years:

    \[ (y_{m+1}-y_1)/m,\quad (y_{m+2}-y_2)/m,\quad \dots,\quad (y_{m+m}-y_m)/m. \]

Then, for addi­tive sea­son­al­ity set s_i=y_i-\ell_m and for mul­ti­plica­tive sea­son­al­ity set s_i=y_i/\ell_m, where i=1,\dots,m. This works pretty well, and is easy to imple­ment, but for short and noisy series it can give occa­sional dodgy results. It also has the dis­ad­van­tage of pro­vid­ing state esti­mates for period m, so that the first fore­cast is for period m+1 rather than period 1.

In some books (e.g., Bow­er­man, O’Connell and Koehler, 2005), a regression-​​based pro­ce­dure is used instead. They sug­gest fit­ting a regres­sion with lin­ear trend to the first few years of data (usu­ally 3 or 4 years are used). Then the ini­tial level \ell_0 is set to the inter­cept, and the ini­tial slope b_0 is set to the regres­sion slope. The ini­tial sea­sonal val­ues s_{-m+1},\dots,s_0 are com­puted from the detrended data. This is a very poor method that should not be used as the trend will be biased by the sea­sonal pat­tern. Imag­ine a sea­sonal pat­tern, for exam­ple, where the last period of the year is always the largest value for the year. Then the trend will be biased upwards. Unfor­tu­nately, Bow­er­man, O’Connell and Koehler (2005) are not alone in rec­om­mend­ing bad meth­ods. I’ve seen sim­i­lar, and worse, pro­ce­dures rec­om­mended in other books.

While it would be pos­si­ble to imple­ment a rea­son­ably good regres­sion method, a much bet­ter pro­ce­dure is based on a decom­po­si­tion. This is what was rec­om­mended in my 2008 Springer book and is imple­mented in the HoltWinters and ets func­tions in R.

  1. First fit a 2\times m mov­ing aver­age smoother to the first 2 or 3 years of data (HoltWinters uses 2 years, ets uses 3 years). Here is a quick intro to mov­ing aver­age smooth­ing.
  2. Then sub­tract (for addi­tive HW) or divide (for mul­ti­plica­tive HW) the smooth trend from the orig­i­nal data to get de-​​trended data. The ini­tial sea­sonal val­ues are then obtained from the aver­aged de-​​trended data. For exam­ple, the ini­tial sea­sonal value for Jan­u­ary is the aver­age of the de-​​trended Januaries.
  3. Next sub­tract (for addi­tive HW) or divide (for mul­ti­plica­tive HW) the sea­sonal val­ues from the orig­i­nal data to get sea­son­ally adjusted data.
  4. Fit a lin­ear trend to the sea­son­ally adjusted data to get the ini­tial level \ell_0 (the inter­cept) and the ini­tial slope b_0.

This is gen­er­ally quite good and fast to imple­ment and allows “fore­casts” to be pro­duced from period 1. (Of course, they are not really fore­casts as the data to be fore­cast has been used in con­struct­ing them.) How­ever, it does require 2 or 3 years of data. For very short time series, an alter­na­tive (imple­mented in the ets func­tion in R) is to set the ini­tial sea­sonal val­ues to be the last year of avail­able data with the mean sub­tracted (for addi­tive HW) or divided out (for mul­ti­plica­tive HW). Then pro­ceed with steps 3 and 4 as above.

Whichever method is used, these ini­tial val­ues should be seen as rough esti­mates only. They can be improved by opti­miz­ing them along with the smooth­ing para­me­ters using max­i­mum like­li­hood esti­ma­tion, for exam­ple. The only imple­men­ta­tion of the Holt-​​Winters’ method that does that, to my knowl­edge, is the ets func­tion in R. In that func­tion, the above pro­ce­dure is used to find start­ing val­ues for the estimation.

Some recent work (De Liv­era, Hyn­d­man and Sny­der, 2010) shows that all of the above may soon be redun­dant for the addi­tive case (but not for the mul­ti­plica­tive case). In Sec­tion 3.1, we show that for lin­ear mod­els, the ini­tial state val­ues can be con­cen­trated out of the like­li­hood and esti­mated directly using a regres­sion pro­ce­dure. Although we present the idea in the con­text of com­plex sea­son­al­ity, it is valid for any lin­ear expo­nen­tial smooth­ing model. I am plan­ning on mod­i­fy­ing the ets func­tion to imple­ment this idea, but it will prob­a­bly have to a wait for a cou­ple of months as my “to do” list is rather long.


Related Posts:


 
12 Comments  comments 
  • http://www.duke-energy.com Scott Haag

    I am an engi­neer, not a math­e­mati­cian. But is the equa­tion for mul­ti­plica­tive St cor­rect? Shouldn’t “b” be pre­ceded by a plus sign, not a neg­a­tive sign? We want to advance to the cur­rent period, not go back to a prior period? Oth­er­wise, a good explanation.

    • http://robjhyndman.com Rob J Hyndman

      Thanks. Now fixed.

  • http://www.xkcd.com John

    Great post. I’d been using the method rec­om­mended by Bow­er­man, O’Connel, and Koehler, so it’s nice to see a critique.

    A ques­tion related to HW but of a dif­fer­ent nature. I’ve been opti­miz­ing alpha, delta, and gamma for HW using a non­lin­ear opti­miza­tion solver to min­i­mize the sum of the squared errors between the actual data and the pre­vi­ous period’s fore­cast of the next period. Are there other approaches for opti­miz­ing these para­me­ters quickly if my goal is to cre­ate a HW model using 3 years worth of data to fore­cast a fourth year? I have 6000 dif­fer­ent time series fore­casts that I need para­me­ters for, which is why I use the word “quickly”.

    Thanks.

    • http://robjhyndman.com Rob J Hyndman

      That method works pretty well. There are sev­eral vari­a­tions on it (as dis­cussed in this IJF (2002) paper, but none that will be quicker than what you are doing if you want to opti­mize the para­me­ters for every time series.

      One alter­na­tive to non­lin­ear opti­miza­tion for each time series is to use the same smooth­ing para­me­ters for all time series that you are fore­cast­ing (or divide the time series into homo­ge­neous groups and use the same para­me­ters in each group).

      Another is to use fixed, pre-​​specified, val­ues of the para­me­ters and don’t tune them at all. This is actu­ally a very com­mon approach and works sur­pris­ingly well.

  • Abhishek

    Hello Rob
    Great post. I was build­ing Winter’s model in MS excel. I am using Solver add in to find opti­mal val­ues for alpha, beta and gamma. It gives alpha and gamma as 1 and beta as zero. I am try­ing to min­i­mize the Mean Absolute Error.
    I am skep­ti­cal about this solu­tion given by solver. Can you sug­gest if the solver is the best tool to find the opti­mal val­ues of the para­me­ter that fits the data or is there some bet­ter tool available.

    • http://robjhyndman.com Rob J Hyndman

      Solver is a non-​​linear opti­mizer. It does a rea­son­able job, but when there are local max­ima it will often con­verge to a non-​​optimal solu­tion. There are bet­ter non-​​linear opti­miz­ers that are more robust to local max­ima (e.g., PSO), but they will take a lot longer. You will have to use some­thing other than MS-​​Excel if you want to explore bet­ter tools. I sug­gest you use R.

  • Sebas­t­ian

    This is a great post and has been extremely use­ful to me (I work with bio­sur­veil­lance data) as well as your for­mu­las for pre­dic­tion inter­vals of the mul­ti­plica­tive HW (Hyn­d­man, Koehler, Ord and Sny­der, 2005)

    Regard­ing these for­mu­las, I have a ques­tion (sorry to ask it in this post):
    As far as I under­stand, they are only valid when the resid­u­als have no auto­cor­re­la­tion. How­ever, I have some data for which HW with opti­miza­tion of the MSE gives highly cor­re­lated resid­u­als (r1 \approx 0.5) Is there a way to mod­ify the equa­tions of the pre­dic­tion inter­vals to take this into account? Or maybe they can be used as lower (or even upper) bounds for the pre­dic­tion intervals?

    Many thanks in advance!

    • http://robjhyndman.com Rob J Hyndman

      Yes, the pre­dic­tion inter­vals assume uncor­re­lated resid­u­als. It would be pos­si­ble, but com­pli­cated, to derive inter­vals allow­ing for auto­cor­re­lated errors. How­ever, if auto­cor­re­la­tion occurs, I sug­gest you use an ARIMA model instead.

  • Lak­shmi

    Dear Rob,

    I am work­ing on holt winter’s method for a FMCG product.I need to fore­cast for future 2 years and i have the his­tory data for 3 years. Are there any thumb rules sug­gest­ing that alpha should not be greater than 0.3, beta should not be greater than 0.3 and gamma should not be greater than 0.1?
    If yes, then at what sit­u­a­tions should we use higher alpha, beta, gamma?

    • http://robjhyndman.com Rob J Hyndman

      If you don’t have enough data to opti­mize the para­me­ters, then a rea­son­able rule is to set alpha=0.2, beta=0.1, gamma=0.1.

  • Tithi Sen

    Hi Rob,

    I have data points for 49 months, while for­mu­lat­ing SI for HW in excel, how many months of data shall Intake to cal­cu­late the ini­tial dou­ble mov­ing average

  • Hamada zakaria

    good morn­ing­please, if you can give me the method of search­ing for alpha beta gamma with Excel or other method,