A blog by Rob J Hyndman 

Twitter Gplus RSS

Forecasting with long seasonal periods

Published on 29 September 2010

I am often asked how to fit an ARIMA or ETS model with data hav­ing a long sea­sonal period such as 365 for daily data or 48 for half-​​hourly data. Gen­er­ally, sea­sonal ver­sions of ARIMA and ETS mod­els are designed for shorter peri­ods such as 12 for monthly data or 4 for quar­terly data.

The ets() func­tion in the fore­cast pack­age restricts sea­son­al­ity to be a max­i­mum period of 24 to allow hourly data but not data with a larger sea­sonal fre­quency. The prob­lem is that there are m-1 para­me­ters to be esti­mated for the ini­tial sea­sonal states where m is the sea­sonal period. So for large m, the esti­ma­tion becomes almost impossible.

The arima() func­tion will allow a sea­sonal period up to m=350 but in prac­tice will usu­ally run out of mem­ory when­ever the sea­sonal period is more than about 200. I haven’t dug into the code to find out why this is the case — the­o­ret­i­cally it would be pos­si­ble to have any length of sea­son­al­ity as the num­ber of para­me­ters to be esti­mated does not depend on the sea­sonal order. How­ever, sea­sonal dif­fer­enc­ing of very high order does not make a lot of sense — for daily data it involves com­par­ing what hap­pened today with what hap­pened exactly a year ago and there is no con­straint that the sea­sonal pat­tern is smooth.

For such data I pre­fer a Fourier series approach where the sea­sonal pat­tern is mod­elled using Fourier terms with short-​​term time series dynam­ics allowed in the error. For exam­ple, con­sider the fol­low­ing model:

    \[ y_t = a + \sum_{k=1}^K \left[ \alpha\sin(2\pi kt/m) + \beta\cos(2\pi kt/m)\right] + N_t, \]

where N_t is an ARIMA process. The value of K can be cho­sen by min­i­miz­ing the AIC.

This is eas­ily fit­ted using R. Here is an exam­ple with m=200, K=4 and N_t an ARIMA(2,0,1) process:

n <- 2000
m <- 200
y <- ts(rnorm(n) + (1:n)%%100/30, f=m)
fourier <- function(t,terms,period)
{
  n <- length(t)
  X <- matrix(,nrow=n,ncol=2*terms)
  for(i in 1:terms)
  {
    X[,2*i-1] <- sin(2*pi*i*t/period)
    X[,2*i] <- cos(2*pi*i*t/period)
  }
  colnames(X) <- paste(c("S","C"),rep(1:terms,rep(2,terms)),sep="")
  return(X)
}
 
library(forecast)
fit <- Arima(y, order=c(2,0,1), xreg=fourier(1:n,4,m))
plot(forecast(fit, h=2*m, xreg=fourier(n+1:(2*m),4,m)))

The advan­tages of this approach are:

  • it allows any length seasonality;
  • for data with more than one sea­sonal period, you can include Fourier terms of dif­fer­ent frequencies;
  • the sea­sonal pat­tern is smooth for small val­ues of K (but more wig­gly sea­son­al­ity can be han­dled by increas­ing K);
  • the short-​​term dynam­ics are eas­ily han­dled with a sim­ple ARMA error.

The only real dis­ad­van­tage (com­pared to a sea­sonal ARIMA model) that I can think of is that the sea­son­al­ity is assumed to be fixed — the pat­tern is not allowed to change over time. But in prac­tice, sea­son­al­ity is usu­ally remark­ably con­stant so this is not a big dis­ad­van­tage except for very long time series.

The order of N_t can also be cho­sen automatically:

fit <- auto.arima(y, D=0, max.P=0, max.Q=0, xreg=fourier(1:n,4,m))

Note that the sea­sonal parts of the ARIMA model should all be set to zero as we do not want N_t to be a sea­sonal ARIMA model.

I’ll add the fourier() func­tion to the fore­cast pack­age on the next release.

It is much harder to do some­thing like this for ETS mod­els. One of our stu­dents has been work­ing on this and our paper on com­plex sea­son­al­ity describes the pro­ce­dure. We plan to add this func­tion­al­ity to the fore­cast pack­age soon, but the code is not quite ready yet.


Related Posts:


 
Tags: ,
3 Comments  comments 
  • http://www.slug.it/naufraghi/ Mat­teo Bertini

    The 350 limit in R is in arima.c: the array rbar can be very big.


    SEXP getQ0(SEXP sPhi, SEXP sTheta)
    /* NB: nrbar could overflow */
    int r = max(p, q + 1);
    size_t np = r * (r + 1) / 2, nrbar = np * (np - 1) / 2;
    /* This is the limit using an int index. We could use
    size_t and get more on a 64-bit system,
    but there seems no practical need.*/
    if(r > 350) error(_("maximum supported lag is 350"));
    rbar = (double *) R_alloc(nrbar, sizeof(double));

    The prob­lem with m=350 is that in some domains, like traf­fic fore­cast­ing, is quite com­mon to have 4 sam­ples per hour and a weekly sea­son­al­ity, result­ing in m=672.
    I think the matrix is related to the Kalman fil­ter, but I’ve found no way to switch to a lighter method.

  • Erhan

    Dear Mr. Hyn­d­man,
    I am work­ing on a the­sis right now, to use ARIMA-​​GARCH mod­el­ing approach to model a data with mul­ti­ple sea­sonal pat­tern ( daily and weekly ). After a search on inter­net, I down­loaded your fore­cast pack­age. How­ever, I could not find out how to model mul­ti­ple sea­son­al­ity using R with your pack­age ? auto.arima( etc. ) does not find mul­ti­ple sea­son­al­ity and I guess I cound not use fourier() func­tion effec­tively. Can you please lead me a way to model mul­ti­ple sea­son­al­ity in R.
    Best Regards,
    Erhan

    • http://robjhyndman.com Rob J Hyndman

      Try the bats() and tbats() functions.