Detecting seasonality


8 February 2014


I occasionally get email asking how to detect whether seasonality is present in a data set. Sometimes the period of the potential seasonality is known, but in other cases it is not.

I’ve discussed before how to estimate an unknown seasonal period, and how to measure the strength of the seasonality. In this post, I want to look at testing if a series is seasonal when the potential period is known (e.g., with quarterly, monthly, daily or hourly data).

One simple approach is to fit a model with allows for seasonality if it is present. For example, you can fit an ETS model using ets() in R, and if the chosen model has a seasonal component, then the data is seasonal. For higher frequency data, or where the seasonal period is non-integer, a TBATS model will do much the same thing via the tbats() function.

This is not a formal test of seasonality, as the model selection is based on the AIC rather than any hypothesis test. However, there is a related log-likelihood test based on the difference between the selected model, and the equivalent model with an additional seasonal term added. Twice the difference between the two log-likelihoods will have a chi-squared distribution according to Wilks’ theorem. The degrees of freedom will be the difference in the number of parameters being estimated in the two models.



 ets(y = pigs) 

  Smoothing parameters:
    alpha = 0.3095 
    gamma = 1e-04 

  Initial states:
    l = 92791.1661 
    s = 6826.521 -181.8789 991.7917 -1546.851 2155.181 5843.908
           1723.405 3923.781 -2662.907 1882.368 -7339.192 -11616.13

  sigma:  9271.526

     AIC     AICc      BIC 
4434.551 4437.342 4483.097 

For example, the pigs data (Monthly number of pigs slaughted in Victoria) does not look very seasonal when plotted (see above), but the ets function selects an ETS(A,N,A) model. That is, it detects an additive seasonal component. We can formally test the significance of the seasonal component as follows.

fit1 <- ets(pigs)
fit2 <- ets(pigs, model="ANN")

deviance <- 2*c(logLik(fit1) - logLik(fit2))
df <- attributes(logLik(fit1))$df - attributes(logLik(fit2))$df
#P value
[1] 5.255499e-07

The resulting p-value is 5.2\times 10^{-7}, so the additional seasonal component is significant.

Personally, I never bother with the hypothesis test as I think it answers the wrong question. If the hypothesis test is significant, we can conclude that the data are very unlikely to have been generated from the simpler (non-seasonal) model. But I don’t actually believe the data were generated by any ETS model, so all this is telling me is that I have enough data to be able to see the difference between my data and the model.

A more useful question is to ask if the seasonal component improves forecast accuracy, and that is precisely what the AIC is telling us. Minimizing the AIC is asymptotically equivalent to minimizing the one-step-head out-of-sample MSE. So a smaller AIC means better forecasts, and that’s what I usually care about.