Seasonal decomposition of short time series

Many users have tried to do a seasonal decomposition with a short time series, and hit the error “Series has less than two periods”.

The problem is that the usual methods of decomposition (e.g., decompose and stl) estimate seasonality using at least as many degrees of freedom as there are seasonal periods. So you need at least two observations per seasonal period to be able to distinguish seasonality from noise.

However, it is possible to use a linear regression model to decompose a time series into trend and seasonal components, and then some smoothness assumptions on the seasonal component allow a decomposition with fewer than two full years of data.

This problem came up on crossvalidated.com recently, with the following data set.

df <- ts(c(2735.869,2857.105,2725.971,2734.809,2761.314,2828.224,2830.284,
  2758.149,2774.943,2782.801,2861.970,2878.688,3049.229,3029.340,3099.041,
  3071.151,3075.576,3146.372,3005.671,3149.381), start=c(2016,8), frequency=12)

We can approximate the seasonal pattern using Fourier terms with a few parameters.

library(forecast)
library(ggplot2)
decompose_df <- tslm(df ~ trend + fourier(df, 2))

Then the usual decomposition plot can be constructed.

trend <- coef(decompose_df)[1] + coef(decompose_df)['trend']*seq_along(df)
components <- cbind(
  data = df,
  trend = trend,
  season = df - trend - residuals(decompose_df),
  remainder = residuals(decompose_df)
)
autoplot(components, facet=TRUE)

Here the model is \[ y_t = \alpha + \beta t + \sum_{k=1}^K \left[ \gamma_k \sin\left(\textstyle\frac{2\pi t k}{m}\right) + \psi_k \cos\left(\textstyle\frac{2\pi t k}{m}\right) \right] + \varepsilon_t, \] where \(1 \le K \le m/2\). The larger the value of \(K\), the more complicated the seasonal pattern that can be estimated. If \(K=m/2\), then we are using \(m-1\) degrees of freedom for the seasonal component. (The last sin term will be dropped as \(\sin(\pi t)=0\) for all \(t\).) For a short time series, we can use a small value for \(K\), which can be selected by minimizing the AICc or leave-one-out CV statistic (both computed using CV()). In the above example, the AICc is minimized with \(K=1\) and CV is minimized with \(K=2\).

The trend could also be made nonlinear, by replacing trend with a polynomial or spline (although both will use up more degrees of freedom, and may not be justified with short time series).

As with other methods of decomposition, it is easy enough to remove the seasonal component to get the seasonally adjusted data.

adjust_df <- df - components[,'season']
autoplot(df, series="Data") +
  autolayer(adjust_df, series="Seasonally adjusted")

comments powered by Disqus