Forecasting with long seasonal periods

I am often asked how to fit an ARIMA or ETS model with data hav­ing a long sea­sonal period such as 365 for daily data or 48 for half-​​hourly data. Gen­er­ally, sea­sonal ver­sions of ARIMA and ETS mod­els are designed for shorter peri­ods such as 12 for monthly data or 4 for quar­terly data.

The ets() func­tion in the fore­cast pack­age restricts sea­son­al­ity to be a max­i­mum period of 24 to allow hourly data but not data with a larger sea­sonal fre­quency. The prob­lem is that there are m-1 para­me­ters to be esti­mated for the ini­tial sea­sonal states where m is the sea­sonal period. So for large m, the esti­ma­tion becomes almost impossible.

The arima() func­tion will allow a sea­sonal period up to m=350 but in prac­tice will usu­ally run out of mem­ory when­ever the sea­sonal period is more than about 200. I haven’t dug into the code to find out why this is the case — the­o­ret­i­cally it would be pos­si­ble to have any length of sea­son­al­ity as the num­ber of para­me­ters to be esti­mated does not depend on the sea­sonal order. How­ever, sea­sonal dif­fer­enc­ing of very high order does not make a lot of sense — for daily data it involves com­par­ing what hap­pened today with what hap­pened exactly a year ago and there is no con­straint that the sea­sonal pat­tern is smooth.

For such data I pre­fer a Fourier series approach where the sea­sonal pat­tern is mod­elled using Fourier terms with short-​​term time series dynam­ics allowed in the error. For exam­ple, con­sider the fol­low­ing model:

    \[ y_t = a + \sum_{k=1}^K \left[ \alpha\sin(2\pi kt/m) + \beta\cos(2\pi kt/m)\right] + N_t, \]

where N_t is an ARIMA process. The value of K can be cho­sen by min­i­miz­ing the AIC.

This is eas­ily fit­ted using R. Here is an exam­ple with m=200, K=4 and N_t an ARIMA(2,0,1) process:

n <- 2000
m <- 200
y <- ts(rnorm(n) + (1:n)%%100/30, f=m)
fourier <- function(t,terms,period)
{
  n <- length(t)
  X <- matrix(,nrow=n,ncol=2*terms)
  for(i in 1:terms)
  {
    X[,2*i-1] <- sin(2*pi*i*t/period)
    X[,2*i] <- cos(2*pi*i*t/period)
  }
  colnames(X) <- paste(c("S","C"),rep(1:terms,rep(2,terms)),sep="")
  return(X)
}
 
library(forecast)
fit <- Arima(y, order=c(2,0,1), xreg=fourier(1:n,4,m))
plot(forecast(fit, h=2*m, xreg=fourier(n+1:(2*m),4,m)))

The advan­tages of this approach are:

  • it allows any length seasonality;
  • for data with more than one sea­sonal period, you can include Fourier terms of dif­fer­ent frequencies;
  • the sea­sonal pat­tern is smooth for small val­ues of K (but more wig­gly sea­son­al­ity can be han­dled by increas­ing K);
  • the short-​​term dynam­ics are eas­ily han­dled with a sim­ple ARMA error.

The only real dis­ad­van­tage (com­pared to a sea­sonal ARIMA model) that I can think of is that the sea­son­al­ity is assumed to be fixed — the pat­tern is not allowed to change over time. But in prac­tice, sea­son­al­ity is usu­ally remark­ably con­stant so this is not a big dis­ad­van­tage except for very long time series.

The order of N_t can also be cho­sen automatically:

fit <- auto.arima(y, seasonal=FALSE, xreg=fourier(1:n,4,m))

Note that the ARIMA model for N_t should be non-​​seasonal.

I’ll add the fourier() func­tion to the fore­cast pack­age on the next release.

It is much harder to do some­thing like this for ETS mod­els. One of our stu­dents has been work­ing on this and our paper on com­plex sea­son­al­ity describes the pro­ce­dure. We plan to add this func­tion­al­ity to the fore­cast pack­age soon, but the code is not quite ready yet.


Related Posts:


  • Mat­teo Bertini

    The 350 limit in R is in arima.c: the array rbar can be very big.


    SEXP getQ0(SEXP sPhi, SEXP sTheta)
    /* NB: nrbar could overflow */
    int r = max(p, q + 1);
    size_t np = r * (r + 1) / 2, nrbar = np * (np - 1) / 2;
    /* This is the limit using an int index. We could use
    size_t and get more on a 64-bit system,
    but there seems no practical need.*/
    if(r > 350) error(_("maximum supported lag is 350"));
    rbar = (double *) R_alloc(nrbar, sizeof(double));

    The prob­lem with m=350 is that in some domains, like traf­fic fore­cast­ing, is quite com­mon to have 4 sam­ples per hour and a weekly sea­son­al­ity, result­ing in m=672.
    I think the matrix is related to the Kalman fil­ter, but I’ve found no way to switch to a lighter method.

  • Erhan

    Dear Mr. Hyn­d­man,
    I am work­ing on a the­sis right now, to use ARIMA-​​GARCH mod­el­ing approach to model a data with mul­ti­ple sea­sonal pat­tern ( daily and weekly ). After a search on inter­net, I down­loaded your fore­cast pack­age. How­ever, I could not find out how to model mul­ti­ple sea­son­al­ity using R with your pack­age ? auto.arima( etc. ) does not find mul­ti­ple sea­son­al­ity and I guess I cound not use fourier() func­tion effec­tively. Can you please lead me a way to model mul­ti­ple sea­son­al­ity in R.
    Best Regards,
    Erhan

    • http://robjhyndman.com Rob J Hyndman

      Try the bats() and tbats() functions.

      • Guo

        I am still con­fus­ing about the mul­ti­ple sea­sonal period model. How should I do if I want to fit a model ARIMA(2,0,2)*(1,0,0)24*(1,0,1)168? Many thanks!

        • http://robjhyndman.com Rob J Hyndman

          R does not have the facil­i­ties to fit that model.

          • Anthony

            Dr. Hyn­d­man,
            What other soft­ware have that facil­ity.
            Thanks
            Anthony

          • http://robjhyndman.com Rob J Hyndman

            I don’t know

  • Pingback: What kind of model could I use to predict annual server loads?()

  • Ahmet Gurel

    Hello Mr. Hyndman,

    Thank you for all your work and blog posts. I read your paper (on com­plex sea­son­al­ity) but I am con­fused at a sim­ple point. What is the exact dif­fer­ence between an ARIMA(p,d,q) with sea­son S with and Arima(p,d,q,P,D,Q,S). I believe the lat­ter com­po­nents are essen­tially have the same inter­pre­ta­tion with p,d,q with lag S. Is this cor­rect? And also, I was won­der­ing if there is a par­tic­u­lar rea­son why you don’t con­sider these mod­els in the paper as addi­tional para­me­ters of tbats or bats?

    Thank you very much,

    Ahmet

    • http://robjhyndman.com Rob J Hyndman

      1. Read http://​otexts​.com/​f​p​p​/8/9/ for sea­sonal ARIMA models.

      2. I have no soft­ware for fit­ting mul­ti­ple period sea­sonal ARIMA models.

      • Ahmet Gurel

        Thank you very much, that was helpful.

  • Ahmet Gurel

    Hello again Dr.
    Hyndman ,

    I have again a quick ques­tion for you. While read­ing your book, I was stuck at a very sim­ple point.

    Lets say I have data com­posed of 10 obser­va­tions and I want to pre­dict the next 5 obser­va­tions. (arimaForecast(data, h=5)). Then I take the the first pre­dic­tion (at time 10+1) and add it to data. Then I pre­dict the next 4 obser­va­tions ((arimaForecast(append(data,forecast11) , h=4)).

    Shouldn’t these fore­casts be the same since I did replace yhat as the algo­rithms do? Actu­ally this ques­tion is not arima spe­cific, but for all the fore­cast­ing algo­r­tihms in Fore­cast package.

    Thank you very much

    Ahmet

    • http://robjhyndman.com Rob J Hyndman

      For a lin­ear model (such as ARIMA), the point fore­casts should be the same (pro­vided the ARIMA model does not change), although the fore­cast vari­ances will change.

  • Arzu

    Dr. Hyder­man,

    Is it nor­mal that for a dou­ble sea­sonal ts with fre­quency 144 (1000 observations)-if I spec­ify sea­sonal peri­ods 144 and 1008, tbats and bats func­tions do not con­verge in rea­son­able time lim­its (i tried to run them for 3 hrs). Is it because of auto.arima func­tion? Even if I did spec­ify para­me­ters such as max.p etc. I still could not get an answer. Do you have any com­ments regard­ing the cause?

    • http://robjhyndman.com Rob J Hyndman

      The func­tions are slow with high sea­sonal peri­ods and a long time series. I’m try­ing to fig­ure out how to make them faster.

      • Arzu

        Thank you for clarification.

  • http://www.facebook.com/anindya5 Anindya Sankar Dey

    Dr. Hyn­d­man,

    In your above code, what I’m not under­stand­ing is,when you are fore­cast­ing, why in the fourier func­tion your are pass­ing (n+1):(2*m)? Can you kindly help?

    • http://robjhyndman.com Rob J Hyndman

      They are the time peri­ods I want to fore­cast, except there is an error in your ques­tion. The code as given in my post is n+1:(2*m).

      • http://www.facebook.com/anindya5 Anindya Sankar Dey

        Thanks…

  • Gior­gio

    Dear Rob,

    in case of an ARIMAX model, would you advise apply­ing the fourier desea­son­al­iza­tion also to the X part, con­sid­er­ing also that each X covari­ate can gen­er­ally have its own seasonality?

    Thanks for the clar­i­fi­ca­tion, best regards

    • http://robjhyndman.com Rob J Hyndman

      Maybe. If the covari­ates are highly sea­sonal, then sea­son­ally adjust­ing them will pre­vent collinear­ity with the fourier terms. But it changes the inter­pre­ta­tion of the model. If you have highly sea­sonal covari­ates, you may not need to user Fourier terms at all, as the sea­son­al­ity in the response vari­able may be entirely dri­ven by the sea­son­al­ity in the covariates.

  • Pingback: Handling Long Seasonality in a Time Series | Jeffrey Chin Tang Wong()

  • Marco Pow­ers

    Hi Pro­fes­sor Hyndman–

    Ques­tion — why did you select K=4 in your above code? Is there an auto­matic process in R used to select K?

    • http://robjhyndman.com/ Rob J Hyndman

      It was just an exam­ple. In prac­tice, you can select K using the AIC.

      • olivier des

        Hello Rob,

        So K depends on the min­i­mum AIC, but is there any for­mula or pro­ce­dure to cal­cu­late it directly?

        Thank you.

        • http://robjhyndman.com/ Rob J Hyndman

          No. Just com­pute the AIC for a few mod­els and select the model with the small­est AIC value.

  • Her­i­man­i­tra

    Dear Prof Hyndman,

    I don’t know if this is the right post to post this ques­tion. I have already tried some googling approach and saw your Stack­Over­flow but I don’t find solution(s) yet.
    Actu­ally, I’m expe­ri­enc­ing prob­lem to set up a daily time series for the pur­pose of pre­dic­tion with “fore­cast” package:

    traindate=ts(traindate,start=1994,f=365)
    #where train is a dataframe, “date” is the name of the time series period with numeric for­mat, for exam­ple 19940101=1994÷01÷01 , etc.
    It seems to work (because no error raises dur­ing set up) but auto.arima() can’t detect sea­son­nal­ity after…
    using zoo pack­age doesn’t solve the prob­lem as fore­cast() doesn’t work with it.

    Could you please pro­vide the cor­rect set­ting I need to do?

    • Her­i­man­i­tra

      In case where it is impos­si­ble (to set up daily time series and fore­cast with the “fore­cast” pack­age),
      I was think­ing of extract­ing monthly val­ues (as, argued, in your 1st para­graph) from the orig­i­nal train and then use the “fore­cast” pack­age. After obtain­ing pre­dic­tions, I was think­ing of tem­po­ral dis­agre­ga­tion (Chow-​​Lin, DENTON) to recover the pre­dic­tion for each day between pre­dicted months

      • http://robjhyndman.com/ Rob J Hyndman

        Again, actu­ally read­ing the arti­cle you are com­ment­ing on would be a very good place to start. You can fit a model using Fourier series. An alter­na­tive is tbats().

    • http://robjhyndman.com/ Rob J Hyndman

      auto.arima will not fit a sea­sonal model with period greater than 350. Please read the arti­cle above.

      • Her­i­man­i­tra

        OK, thanks, so if really under­stand the arti­cle, if I had 3years with 365days (for exam­ple, the 2 first year) and 366days (for the last year), I can spec­ify 02 Fourier Series like this?
        period1=365
        period2=366
        fourier (…) {

        X[,2*i-1] <- sin(2*pi*i*t/period1)
        X[,2*i] <- cos(2*pi*i*t/period1)
        X[,2*i+1] <- sin(2*pi*i*t/period2)
        X[,2*i+2] <- cos(2*pi*i*t/period2)

        }
        Am I cor­rect in this specification?

        • http://robjhyndman.com/ Rob J Hyndman

          No. The period of the data should not change from year to year. Pre­sum­ably you mean that the third year is a leap year. In that case, set the period to be 365.25 years.

          • Her­i­man­i­tra

            Yes, leap year,
            OK

          • Her­i­man­i­tra

            I’m also curi­ous to know how to restrict out­put of the fore­cast() func­tion to the spec­i­fied horizon’s length in order to speed up com­pu­ta­tion, it’s really slow at a cer­tain level (~2000)
            I’ve tried some­thing like this but it didn’t work:

            forecast(mymodel,h=100,method=“naïve”)$fitted[2001:2100]

          • Her­i­man­i­tra

            I fig­ure out now that one needs to do: forecast(mymodel,h=100,method=“naïve”)$mean
            to obtain mean fore­cast in the desired hori­zons but the main prob­lem I have is that with HoltWin­ters, it’s really slow com­pared to loess decomposition.

  • Her­i­man­i­tra

    I’ve just seen some­one setup ts() with daily fre­quency and use fore­cast() after:

    What do you think about that?

    Here is the link:

    http://www.r-bloggers.com/simple-and-advanced-time-series-with-oracle-r-enterprise/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29&utm_content=Yahoo%21+Mail

    • http://robjhyndman.com/ Rob J Hyndman

      If you do that with auto.arima, it will return a non-​​seasonal model. If you try fit­ting a sea­sonal model with Arima or arima, it will return an error.

  • Son­bol

    Hello Dr. Hyndman,

    I have a time series mea­sur­ing solar radi­ance every minute. So I am assum­ing that I have a sea­son­al­ity with length of 1440 (24×60). I am try­ing to use Fourier as xreg to con­sider the sea­son­al­ity. I’ve con­sid­ered a model like:

    fit <-Arima(training, order=c(3,1,3), xreg=fourier(training,3)).
    My aim is not fore­cast­ing. I am going to eval­u­ate the model on a test­ing set. So I have some­thing like

    fit1 <- Arima(checking, model=fit, xreg=(?))
    My prob­lem is that I do not know how to choose xreg on the test­ing data. Is it right to con­sider xreg=fourier(checking,3)?

  • Shub­ham

    What does ‘n’ mean in the code? Is it the num­ber of data points in the time series?

    • http://robjhyndman.com/ Rob J Hyndman

      yes

  • Azza Ali

    If I inte­grat time series with judg­ment what is right Equation ?

  • Pingback: Arima function doesn't consider seasonal components | Question and Answer()