Initializing the HoltWinters method
The HoltWinters method is a popular and effective approach to forecasting seasonal time series. But different implementations will give different forecasts, depending on how the method is initialized and how the smoothing parameters are selected. In this post I will discuss various initialization methods.
Suppose the time series is denoted by $y_1,\dots,y_n$ and the seasonal period is $m$ (e.g., $m=12$ for monthly data). Let $\hat{y}_{t+ht}$ be the $h$step forecast made using data to time $t$. Then the additive formulation of HoltWinters' method is given by the following equations
and the multiplicative version is given by
In many books, the seasonal equation (with $s_t$ on the LHS) is slightly different from these, but I prefer the version above because it makes it easier to write the system in state space form. In practice, the modified form makes very little difference to the forecasts.
In my 1998 textbook, the following initialization was proposed. Set
The level is obviously the average of the first year of data. The slope is set to be the average of the slopes for each period in the first two years:
Then, for additive seasonality set $s_i=y_i\ell_m$ and for multiplicative seasonality set $s_i=y_i/\ell_m$, where $i=1,\dots,m$. This works pretty well, and is easy to implement, but for short and noisy series it can give occasional dodgy results. It also has the disadvantage of providing state estimates for period $m$, so that the first forecast is for period $m+1$ rather than period 1.
In some books (e.g., Bowerman, O’Connell and Koehler, 2005), a regressionbased procedure is used instead. They suggest fitting a regression with linear trend to the first few years of data (usually 3 or 4 years are used). Then the initial level $\ell_0$ is set to the intercept, and the initial slope $b_0$ is set to the regression slope. The initial seasonal values $s_{m+1},\dots,s_0$ are computed from the detrended data. This is a very poor method that should not be used as the trend will be biased by the seasonal pattern. Imagine a seasonal pattern, for example, where the last period of the year is always the largest value for the year. Then the trend will be biased upwards. Unfortunately, Bowerman, O’Connell and Koehler (2005) are not alone in recommending bad methods. I’ve seen similar, and worse, procedures recommended in other books.
While it would be possible to implement a reasonably good regression method, a much better procedure is based on a decomposition. This is what was recommended in my 2008 Springer book and is implemented in the HoltWinters
and ets
functions in R.

First fit a $2\times m$ moving average smoother to the first 2 or 3 years of data (
HoltWinters
uses 2 years,ets
uses 3 years). Here is a quick intro to moving average smoothing. 
Then subtract (for additive HW) or divide (for multiplicative HW) the smooth trend from the original data to get detrended data. The initial seasonal values are then obtained from the averaged detrended data. For example, the initial seasonal value for January is the average of the detrended Januaries.

Next subtract (for additive HW) or divide (for multiplicative HW) the seasonal values from the original data to get seasonally adjusted data.

Fit a linear trend to the seasonally adjusted data to get the initial level $\ell_0$ (the intercept) and the initial slope $b_0$.
This is generally quite good and fast to implement and allows “forecasts” to be produced from period 1. (Of course, they are not really forecasts as the data to be forecast has been used in constructing them.) However, it does require 2 or 3 years of data. For very short time series, an alternative (implemented in the ets
function in R from v4.07) is to use a simple linear model with time trend and first order Fourier approximation to the seasonal component. Use the linear trend in place of the moving average smoother, then proceed with steps 24 as above.
Whichever method is used, these initial values should be seen as rough estimates only. They can be improved by optimizing them along with the smoothing parameters using maximum likelihood estimation, for example. The only implementation of the HoltWinters' method that does that, to my knowledge, is the ets
function in R. In that function, the above procedure is used to find starting values for the estimation.
Some recent work (De Livera, Hyndman and Snyder, 2010) shows that all of the above may soon be redundant for the additive case (but not for the multiplicative case). In Section 3.1, we show that for linear models, the initial state values can be concentrated out of the likelihood and estimated directly using a regression procedure. Although we present the idea in the context of complex seasonality, it is valid for any linear exponential smoothing model. I am planning on modifying the ets
function to implement this idea, but it will probably have to a wait for a couple of months as my “to do” list is rather long.