Forecasting with long seasonal periods

I am often asked how to fit an ARIMA or ETS model with data hav­ing a long sea­sonal period such as 365 for daily data or 48 for half-​​hourly data. Gen­er­ally, sea­sonal ver­sions of ARIMA and ETS mod­els are designed for shorter peri­ods such as 12 for monthly data or 4 for quar­terly data.

The ets() func­tion in the fore­cast pack­age restricts sea­son­al­ity to be a max­i­mum period of 24 to allow hourly data but not data with a larger sea­sonal fre­quency. The prob­lem is that there are m-1 para­me­ters to be esti­mated for the ini­tial sea­sonal states where m is the sea­sonal period. So for large m, the esti­ma­tion becomes almost impossible.

The arima() func­tion will allow a sea­sonal period up to m=350 but in prac­tice will usu­ally run out of mem­ory when­ever the sea­sonal period is more than about 200. I haven’t dug into the code to find out why this is the case — the­o­ret­i­cally it would be pos­si­ble to have any length of sea­son­al­ity as the num­ber of para­me­ters to be esti­mated does not depend on the sea­sonal order. How­ever, sea­sonal dif­fer­enc­ing of very high order does not make a lot of sense — for daily data it involves com­par­ing what hap­pened today with what hap­pened exactly a year ago and there is no con­straint that the sea­sonal pat­tern is smooth.

For such data I pre­fer a Fourier series approach where the sea­sonal pat­tern is mod­elled using Fourier terms with short-​​term time series dynam­ics allowed in the error. For exam­ple, con­sider the fol­low­ing model:

    \[ y_t = a + \sum_{k=1}^K \left[ \alpha\sin(2\pi kt/m) + \beta\cos(2\pi kt/m)\right] + N_t, \]

where N_t is an ARIMA process. The value of K can be cho­sen by min­i­miz­ing the AIC.

This is eas­ily fit­ted using R. Here is an exam­ple with m=200, K=4 and N_t an ARIMA(2,0,1) process:

n <- 2000
m <- 200
y <- ts(rnorm(n) + (1:n)%%100/30, f=m)
fourier <- function(t,terms,period)
  n <- length(t)
  X <- matrix(,nrow=n,ncol=2*terms)
  for(i in 1:terms)
    X[,2*i-1] <- sin(2*pi*i*t/period)
    X[,2*i] <- cos(2*pi*i*t/period)
  colnames(X) <- paste(c("S","C"),rep(1:terms,rep(2,terms)),sep="")
fit <- Arima(y, order=c(2,0,1), xreg=fourier(1:n,4,m))
plot(forecast(fit, h=2*m, xreg=fourier(n+1:(2*m),4,m)))

The advan­tages of this approach are:

  • it allows any length seasonality;
  • for data with more than one sea­sonal period, you can include Fourier terms of dif­fer­ent frequencies;
  • the sea­sonal pat­tern is smooth for small val­ues of K (but more wig­gly sea­son­al­ity can be han­dled by increas­ing K);
  • the short-​​term dynam­ics are eas­ily han­dled with a sim­ple ARMA error.

The only real dis­ad­van­tage (com­pared to a sea­sonal ARIMA model) that I can think of is that the sea­son­al­ity is assumed to be fixed — the pat­tern is not allowed to change over time. But in prac­tice, sea­son­al­ity is usu­ally remark­ably con­stant so this is not a big dis­ad­van­tage except for very long time series.

The order of N_t can also be cho­sen automatically:

fit <- auto.arima(y, seasonal=FALSE, xreg=fourier(1:n,4,m))

Note that the ARIMA model for N_t should be non-​​seasonal.

I’ll add the fourier() func­tion to the fore­cast pack­age on the next release.

It is much harder to do some­thing like this for ETS mod­els. One of our stu­dents has been work­ing on this and our paper on com­plex sea­son­al­ity describes the pro­ce­dure. We plan to add this func­tion­al­ity to the fore­cast pack­age soon, but the code is not quite ready yet.

Related Posts:

  • Mat­teo Bertini

    The 350 limit in R is in arima.c: the array rbar can be very big.

    SEXP getQ0(SEXP sPhi, SEXP sTheta)
    /* NB: nrbar could overflow */
    int r = max(p, q + 1);
    size_t np = r * (r + 1) / 2, nrbar = np * (np - 1) / 2;
    /* This is the limit using an int index. We could use
    size_t and get more on a 64-bit system,
    but there seems no practical need.*/
    if(r > 350) error(_("maximum supported lag is 350"));
    rbar = (double *) R_alloc(nrbar, sizeof(double));

    The prob­lem with m=350 is that in some domains, like traf­fic fore­cast­ing, is quite com­mon to have 4 sam­ples per hour and a weekly sea­son­al­ity, result­ing in m=672.
    I think the matrix is related to the Kalman fil­ter, but I’ve found no way to switch to a lighter method.

  • Erhan

    Dear Mr. Hyn­d­man,
    I am work­ing on a the­sis right now, to use ARIMA-​​GARCH mod­el­ing approach to model a data with mul­ti­ple sea­sonal pat­tern ( daily and weekly ). After a search on inter­net, I down­loaded your fore­cast pack­age. How­ever, I could not find out how to model mul­ti­ple sea­son­al­ity using R with your pack­age ? auto.arima( etc. ) does not find mul­ti­ple sea­son­al­ity and I guess I cound not use fourier() func­tion effec­tively. Can you please lead me a way to model mul­ti­ple sea­son­al­ity in R.
    Best Regards,

    • Rob J Hyndman

      Try the bats() and tbats() functions.

      • Guo

        I am still con­fus­ing about the mul­ti­ple sea­sonal period model. How should I do if I want to fit a model ARIMA(2,0,2)*(1,0,0)24*(1,0,1)168? Many thanks!

        • Rob J Hyndman

          R does not have the facil­i­ties to fit that model.

          • Anthony

            Dr. Hyn­d­man,
            What other soft­ware have that facil­ity.

          • Rob J Hyndman

            I don’t know

          • Rahul

            Try WEKA 3.7.11

  • Pingback: What kind of model could I use to predict annual server loads?()

  • Ahmet Gurel

    Hello Mr. Hyndman,

    Thank you for all your work and blog posts. I read your paper (on com­plex sea­son­al­ity) but I am con­fused at a sim­ple point. What is the exact dif­fer­ence between an ARIMA(p,d,q) with sea­son S with and Arima(p,d,q,P,D,Q,S). I believe the lat­ter com­po­nents are essen­tially have the same inter­pre­ta­tion with p,d,q with lag S. Is this cor­rect? And also, I was won­der­ing if there is a par­tic­u­lar rea­son why you don’t con­sider these mod­els in the paper as addi­tional para­me­ters of tbats or bats?

    Thank you very much,


    • Rob J Hyndman

      1. Read http://​otexts​.com/​f​p​p​/8/9/ for sea­sonal ARIMA models.

      2. I have no soft­ware for fit­ting mul­ti­ple period sea­sonal ARIMA models.

      • Ahmet Gurel

        Thank you very much, that was helpful.

  • Ahmet Gurel

    Hello again Dr.
    Hyndman ,

    I have again a quick ques­tion for you. While read­ing your book, I was stuck at a very sim­ple point.

    Lets say I have data com­posed of 10 obser­va­tions and I want to pre­dict the next 5 obser­va­tions. (arimaForecast(data, h=5)). Then I take the the first pre­dic­tion (at time 10+1) and add it to data. Then I pre­dict the next 4 obser­va­tions ((arimaForecast(append(data,forecast11) , h=4)).

    Shouldn’t these fore­casts be the same since I did replace yhat as the algo­rithms do? Actu­ally this ques­tion is not arima spe­cific, but for all the fore­cast­ing algo­r­tihms in Fore­cast package.

    Thank you very much


    • Rob J Hyndman

      For a lin­ear model (such as ARIMA), the point fore­casts should be the same (pro­vided the ARIMA model does not change), although the fore­cast vari­ances will change.

  • Arzu

    Dr. Hyder­man,

    Is it nor­mal that for a dou­ble sea­sonal ts with fre­quency 144 (1000 observations)-if I spec­ify sea­sonal peri­ods 144 and 1008, tbats and bats func­tions do not con­verge in rea­son­able time lim­its (i tried to run them for 3 hrs). Is it because of auto.arima func­tion? Even if I did spec­ify para­me­ters such as max.p etc. I still could not get an answer. Do you have any com­ments regard­ing the cause?

    • Rob J Hyndman

      The func­tions are slow with high sea­sonal peri­ods and a long time series. I’m try­ing to fig­ure out how to make them faster.

      • Arzu

        Thank you for clarification.

  • Anindya Sankar Dey

    Dr. Hyn­d­man,

    In your above code, what I’m not under­stand­ing is,when you are fore­cast­ing, why in the fourier func­tion your are pass­ing (n+1):(2*m)? Can you kindly help?

    • Rob J Hyndman

      They are the time peri­ods I want to fore­cast, except there is an error in your ques­tion. The code as given in my post is n+1:(2*m).

      • Anindya Sankar Dey


  • Gior­gio

    Dear Rob,

    in case of an ARIMAX model, would you advise apply­ing the fourier desea­son­al­iza­tion also to the X part, con­sid­er­ing also that each X covari­ate can gen­er­ally have its own seasonality?

    Thanks for the clar­i­fi­ca­tion, best regards

    • Rob J Hyndman

      Maybe. If the covari­ates are highly sea­sonal, then sea­son­ally adjust­ing them will pre­vent collinear­ity with the fourier terms. But it changes the inter­pre­ta­tion of the model. If you have highly sea­sonal covari­ates, you may not need to user Fourier terms at all, as the sea­son­al­ity in the response vari­able may be entirely dri­ven by the sea­son­al­ity in the covariates.

  • Pingback: Handling Long Seasonality in a Time Series | Jeffrey Chin Tang Wong()

  • Marco Pow­ers

    Hi Pro­fes­sor Hyndman–

    Ques­tion — why did you select K=4 in your above code? Is there an auto­matic process in R used to select K?

    • Rob J Hyndman

      It was just an exam­ple. In prac­tice, you can select K using the AIC.

      • olivier des

        Hello Rob,

        So K depends on the min­i­mum AIC, but is there any for­mula or pro­ce­dure to cal­cu­late it directly?

        Thank you.

        • Rob J Hyndman

          No. Just com­pute the AIC for a few mod­els and select the model with the small­est AIC value.

  • Her­i­man­i­tra

    Dear Prof Hyndman,

    I don’t know if this is the right post to post this ques­tion. I have already tried some googling approach and saw your Stack­Over­flow but I don’t find solution(s) yet.
    Actu­ally, I’m expe­ri­enc­ing prob­lem to set up a daily time series for the pur­pose of pre­dic­tion with “fore­cast” package:

    #where train is a dataframe, “date” is the name of the time series period with numeric for­mat, for exam­ple 19940101=1994÷01÷01 , etc.
    It seems to work (because no error raises dur­ing set up) but auto.arima() can’t detect sea­son­nal­ity after…
    using zoo pack­age doesn’t solve the prob­lem as fore­cast() doesn’t work with it.

    Could you please pro­vide the cor­rect set­ting I need to do?

    • Her­i­man­i­tra

      In case where it is impos­si­ble (to set up daily time series and fore­cast with the “fore­cast” pack­age),
      I was think­ing of extract­ing monthly val­ues (as, argued, in your 1st para­graph) from the orig­i­nal train and then use the “fore­cast” pack­age. After obtain­ing pre­dic­tions, I was think­ing of tem­po­ral dis­agre­ga­tion (Chow-​​Lin, DENTON) to recover the pre­dic­tion for each day between pre­dicted months

      • Rob J Hyndman

        Again, actu­ally read­ing the arti­cle you are com­ment­ing on would be a very good place to start. You can fit a model using Fourier series. An alter­na­tive is tbats().

    • Rob J Hyndman

      auto.arima will not fit a sea­sonal model with period greater than 350. Please read the arti­cle above.

      • Her­i­man­i­tra

        OK, thanks, so if really under­stand the arti­cle, if I had 3years with 365days (for exam­ple, the 2 first year) and 366days (for the last year), I can spec­ify 02 Fourier Series like this?
        fourier (…) {

        X[,2*i-1] <- sin(2*pi*i*t/period1)
        X[,2*i] <- cos(2*pi*i*t/period1)
        X[,2*i+1] <- sin(2*pi*i*t/period2)
        X[,2*i+2] <- cos(2*pi*i*t/period2)

        Am I cor­rect in this specification?

        • Rob J Hyndman

          No. The period of the data should not change from year to year. Pre­sum­ably you mean that the third year is a leap year. In that case, set the period to be 365.25 years.

          • Her­i­man­i­tra

            Yes, leap year,

          • Her­i­man­i­tra

            I’m also curi­ous to know how to restrict out­put of the fore­cast() func­tion to the spec­i­fied horizon’s length in order to speed up com­pu­ta­tion, it’s really slow at a cer­tain level (~2000)
            I’ve tried some­thing like this but it didn’t work:


          • Her­i­man­i­tra

            I fig­ure out now that one needs to do: forecast(mymodel,h=100,method=“naïve”)$mean
            to obtain mean fore­cast in the desired hori­zons but the main prob­lem I have is that with HoltWin­ters, it’s really slow com­pared to loess decomposition.

  • Her­i­man­i­tra

    I’ve just seen some­one setup ts() with daily fre­quency and use fore­cast() after:

    What do you think about that?

    Here is the link:

    • Rob J Hyndman

      If you do that with auto.arima, it will return a non-​​seasonal model. If you try fit­ting a sea­sonal model with Arima or arima, it will return an error.

  • Son­bol

    Hello Dr. Hyndman,

    I have a time series mea­sur­ing solar radi­ance every minute. So I am assum­ing that I have a sea­son­al­ity with length of 1440 (24×60). I am try­ing to use Fourier as xreg to con­sider the sea­son­al­ity. I’ve con­sid­ered a model like:

    fit <-Arima(training, order=c(3,1,3), xreg=fourier(training,3)).
    My aim is not fore­cast­ing. I am going to eval­u­ate the model on a test­ing set. So I have some­thing like

    fit1 <- Arima(checking, model=fit, xreg=(?))
    My prob­lem is that I do not know how to choose xreg on the test­ing data. Is it right to con­sider xreg=fourier(checking,3)?

  • Shub­ham

    What does ‘n’ mean in the code? Is it the num­ber of data points in the time series?

    • Rob J Hyndman


  • Azza Ali

    If I inte­grat time series with judg­ment what is right Equation ?

  • Pingback: Arima function doesn't consider seasonal components | Question and Answer()

  • Gam­age

    Dear Prof Hyndman,

    I’m doing my final year under­grad­u­ate research on pre­dict­ing tourist arrivals.
    Monthly data is avail­able from 1973 to 2013. But I sup­posed to use data from 2000 to 2013 in my analy­sis.
    But the civil war of the coun­try, which was last­ing more than 30 years, was over in 2009. Hence after 2008 the behav­ior of tourist arrivals is rather dif­fer­ent. (time series plot of monthly tourist arrivals from 1973-​​2013is attached)

    Hence if I use data from 2009–2013 I have 60 monthly data points.
    I surfed net & stud­ied some arti­cles for the min­i­mum sam­ple size required in time series analy­sis. In those arti­cles they were say­ing that min­i­mum of 50–60 data points is required.

    Sir I would like to know whether it is capa­ble to do time series analy­sis with these 60 monthly observations

    Thank you.

  • A.D

    Dear Prof. Hyndman,

    I have two large time series data. One is sep­a­rated by sec­onds inter­vals and the other by min­utes. The length of each time series is 180 days. I’m using R (3.1.1) for fore­cast­ing the data. I’d like to know the value of the “fre­quency” argu­ment in the ts() func­tion in R, for each data set.

    Since most of the exam­ples and cases I’ve seen so far are for months or days at the most, it is quite con­fus­ing for me when deal­ing with equally sep­a­rated sec­onds or min­utes. Accord­ing to my under­stand­ing, the “fre­quency” argu­ment is the num­ber of obser­va­tions per sea­son. So what is the “sea­son” in the case of seconds/​minutes? My guess is that since there are 86,400 sec­onds and 1440 min­utes a day, these should be the val­ues for the “freq” argu­ment. Is that correct?

    I have posted a more con­structed ver­sion of this ques­tion on cross val­i­da­tion: http://​stats​.stack​ex​change​.com/​q​u​e​s​t​i​o​n​s​/​1​2​0​8​0​6​/​f​r​e​q​u​e​n​c​y​-​v​a​l​u​e​-​f​o​r​-​s​e​c​o​n​d​s​-​m​i​n​u​t​e​s​-​i​n​t​e​r​v​a​l​s​-​d​a​t​a​-in-r

    unfor­tu­nately, I haven’t received any answer to this ques­tion.
    I would very much appre­ci­ate you help!!

  • Akhil Dua

    Hello Sir,

    I am work­ing with daily data on flight book­ing by fare class.

    Data sam­ple :

    A B C D Depar­ture date

    10 20 30 5 15th Nov 2012

    5 50 2 6 16th Nov 2012
    . . . . .
    . . . . .
    5 50 2 6 16th Nov 2014
    Here A, B, C and D are fare classes
    I tried fit­ting arima model to all fare classes using auto.arima model for fore­cast­ing demand for the next day. I know that there is sea­sonal effect and week­end effect but auto.arima doesn’t pick up these effect.
    Next I tried run­ning auto.arima with dum­mies for month and week­end but still the model doesn’t fit well with data and the fit­ted val­ues from the model are very dif­fer­ent from the actual val­ues.
    Can you please sug­gest me which func­tion of the pack­age shall I use for mod­el­ing this kind of seasonality

  • Pingback: Fitting models to short time series | Hyndsight()

  • Pingback: Reddit Button Forecast (Post-Mortem*: 5/25) | Bryan S. Weber()

  • Indrek Sepp

    Dear Prof. Hyndman,

    Is there a dan­ger that using fourier sea­sonal regressor(s) on time-​​series with non-​​seasonal trends and/​or level shifts, that have a greater mag­ni­tude than the sea­sonal com­po­nent of the TS, would result in the fourier series fit­ting these arti­facts, and pro­ject­ing them into the forecast?

  • 黄耀鹏

    Thank you very much for shar­ing! I’ve just met such kind of prob­lem when using the arima model with a sea­son priod of 288.