Forecasting with daily data

I’ve had sev­eral emails recently ask­ing how to fore­cast daily data in R. Unless the time series is very long, the eas­i­est approach is to sim­ply set the fre­quency attribute to 7.

y <- ts(x, frequency=7)

Then any of the usual time series fore­cast­ing meth­ods should pro­duce rea­son­able fore­casts. For example

library(forecast)
fit <- ets(y)
fc <- forecast(fit)
plot(fc)

When the time series is long enough to take in more than a year, then it may be nec­es­sary to allow for annual sea­son­al­ity as well as weekly sea­son­al­ity. In that case, a mul­ti­ple sea­sonal model such as TBATS is required.

y <- msts(x, seasonal.periods=c(7,365.25))
fit <- tbats(y)
fc <- forecast(fit)
plot(fc)

This should cap­ture the weekly pat­tern as well as the longer annual pat­tern. The period 365.25 is the aver­age length of a year allow­ing for leap years. In some coun­tries, alter­na­tive or addi­tional year lengths may be nec­es­sary. For exam­ple, with the Turk­ish elec­tric­ity data analysed in De Liv­era et al (JASA 2011), we used three sea­sonal peri­ods: 7, 354.35 and 365.25. The period 354.37 is the aver­age length of the Islamic calendar.

Cap­tur­ing sea­son­al­ity asso­ci­ated with mov­ing events such as Easter or the Chi­nese New Year is more dif­fi­cult. Even with monthly data, this can be tricky as the fes­ti­vals can fall in either March or April (for Easter) or in Jan­u­ary or Feb­ru­ary (for the Chi­nese New Year). The usual sea­sonal mod­els don’t allow for this, and even the com­plex sea­son­al­ity dis­cussed in my JASA paper assumes that the sea­sonal pat­terns occur at the same time in each year. The best way to deal with mov­ing hol­i­day effects is to use dummy vari­ables. How­ever, nei­ther ETS nor TBATS mod­els allow for covari­ates. A state space model of the same form as TBATS but with mul­ti­ple sources of error and covari­ates could be used, but I don’t have any R code to do that.

Instead, I would use a regres­sion model with ARIMA errors, where the regres­sion terms include any dummy hol­i­day effects as well as the longer annual sea­son­al­ity. Unless there are many decades of data, it is usu­ally rea­son­able to assume that the annual sea­sonal shape is unchanged from year to year, and so Fourier terms can be used to model the annual sea­son­al­ity. Sup­pose we use K=5 Fourier terms to model annual sea­son­al­ity, and that the hol­i­day dummy vari­ables are in the vec­tor holiday with 100 future val­ues in holidayf. Then the fol­low­ing code will fit an appro­pri­ate model.

y <- ts(x, frequency=7)
z <- fourier(ts(x, frequency=365.25), K=5)
zf <- fourierf(ts(x, frequency=365.25), K=5, h=100)
fit <- auto.arima(y, xreg=cbind(z,holiday))
fc <- forecast(fit, xreg=cbind(zf,holidayf), h=100)

The order K can be cho­sen by min­i­miz­ing the AIC of the fit­ted model.


Related Posts:


  • Kris Ewican

    This is very use­ful Rob. Thank you very much!

  • Pingback: Momento R do Dia | De Gustibus Non Est Disputandum()

  • mike

    Thanks for the post. Cou­ple of ques­tions: 1) When using regres­sion with ARMA with Fourier explana­tory vari­ables, do you go b back and remove the indi­vid­ual Fourier series that are are insignif­i­cant? 2) can you use xreg= with TBATS?

    • http://robjhyndman.com/ Rob J Hyndman

      1. No. I use the AIC to deter­mine K, and leave all the terms in. Sig­nif­i­cance is not as impor­tant as being use­ful for pre­dic­tion, and they are not the same thing.

      2. No.

  • Pingback: Somewhere else, part 76 | Freakonometrics()

  • Richard War­nung

    Very nice sum­mary, very use­ful, thanks for all your efforts and best wishes from Vienna.

  • Leo

    Pro­fes­sor Hyn­d­man,
    What will be the sea­son­al­ity of hourly data that’s avail­able for week­days (5 days of week) or daily data that’s avail­able for five days of the week?
    regards
    Leo

    • Leo

      I meant sea­sonal period which I guess should be 5 for week­days data.

  • Pingback: Rtistry & Rbominations :: linklist, sunday 2013-09-29()

  • har­vey chaparro

    how to cre­ate the hol­i­day and hol­i­dayf vectors?

  • Anthony

    Do you think ets or tbats are use­ful for ana­lyz­ing daily stock mar­ket data?

    More­over, I am using tbats but R is run­ning the bats com­mand. At least that’s what it shows when I enter the datat­bats (my tbats object) at the R prompt.

    • http://robjhyndman.com/ Rob J Hyndman

      No. These func­tions are for data that have trends and sea­son­al­ity. Daily stock mar­ket data typ­i­cally have neither.

      When you call tbats() with a non sea­sonal time series, it will return a non-​​seasonal BATS model as that is equiv­a­lent to a non-​​seasonal TBATS model.

      • Anthony

        Stock mar­ket data does have a trend. Per­haps sim­ple Holt’s method can pro­duce some­thing good.

        • http://robjhyndman.com/ Rob J Hyndman

          Wrong. Stock mar­ket data has some appar­ent local trends that are sim­ply the local effects of ran­dom walk like behav­iour. Holt’s method will give use­less fore­casts for daily stock data.

  • A.M.jaber

    Hi
    i would like to ask how I can extract and fore­cast­ing trend using Empir­i­cal Mode Decom­po­si­tion with R code for daily stock mar­ket data

  • Abhishek

    Respected Sir,

    I am a stu­dent of com­puter sci­ence and cur­rently I am work­ing on my project.You are the expert of this field and I have seen on your blog that you help every­one so it is so kind of you.

    I am work­ing on fore­cast­ing of the energy con­sump­tion in R . I do have the data of pre­vi­ous 30 years and I want to fore­cast the data of next 5 years. Can you sug­gest the best way to do it ?

    Thank you.

  • Ville

    Hi, one can deter­mine the para­me­ters alpha, gamma etc or give upper and lower bounds for them when using ETS. Can you fix the val­ues or give bound­aries to the para­me­ters when using BATS/​TBATS? Thanks

    • http://robjhyndman.com/ Rob J Hyndman

      Yes. Read the help file for ets().
      No. BATS/​TBATS mod­els are cur­rently only avail­able using a com­pletely auto­mated pro­ce­dure. We may intro­duce more man­ual model spec­i­fi­ca­tion in a later version.

      • Ville

        Thank you for your reply. I think that more man­ual model spec­i­fi­ca­tions in BATS/​TBATS would be a great improve­ment and I’m look­ing for­ward to see­ing it in the future!

  • KM

    Mr. Hyn­d­man, let me first thank you immensely for teach­ing me so much about fore­cast­ing. I have taken up a few courses and worked at two lead­ing firms in the past but the amount I’ve learnt from your blog posts is far more.

    I am try­ing to learn more fore­cast­ing using R cur­rently to aide in my mas­ters course. I got a time series data (for 5 years) from a third party fmcg data ven­dor which has weekly and monthly sea­son­al­ity which I could see by using the decom­pose() function.

    I am try­ing to fore­cast using the code below:

    mydata1 <-read.csv(“E:/file.csv”,header=TRUE);
    mydata <- msts(mydata1, seasonal.periods=c(7,365.25,354.37,365));
    fit <- tbats(mydata,use.box.cox=NULL, use.trend=NULL, use.damped.trend=NULL,seasonal.periods=c(7,365.25,365,354.37), use.arma.errors=TRUE, use.parallel=TRUE,bc.lower=0, bc.upper=1);
    fcast <- forecast(fit,h=433);
    plot(fcast);
    fcast.df <- data.frame(fcast)
    WriteXLS(“fcast.df”,“E:/fcast.xlsx”);
    where I have used weekly, Gre­go­rian, Hindu and Hijri cal­en­dars to set sea­sonal peri­ods.
    The cor­re­la­tion between the fore­casted data and observed data is ~0.82 which seems low to me. The major rea­son could be that I can see peaks on a few par­tic­u­lar dates like Jan1 and Dec25 year on year which is not fore­casted by tbats.
    Is there a way to include this into the code? What is the sig­nif­i­cance of man­u­ally set­ting the box-​​cox lim­its?
    Regards,
    KM

    • http://robjhyndman.com/ Rob J Hyndman

      The tbats model does not allow for covari­ates, so spe­cific effects such as Christ­mas and New Year can­not be han­dled by dummy vari­ables. How­ever, you could use the Fourier-​​ARIMA approach men­tioned above, and add the covari­ate as I’ve explained. The Box-​​Cox para­me­ter is nor­mally restricted to (0,1). The argu­ments allow other ranges which are occa­sion­ally useful.

      • nin­nawei

        how to use the hol­i­day vec­tor spe­cific? could you take an example?

      • KM

        Thank you Mr. Hyn­d­man for the expla­na­tion. The data has a weekly and monthly sea­son­al­ity and requires me to use TBATS as you said here: http://​www​.​r​-blog​gers​.com/​f​o​r​e​c​a​s​t​i​n​g​-​w​e​e​k​l​y​-​data/
        Is there a way to decom­pose using tbats.components() to get trend, sea­sonal com­po­nents and irreg­u­lar com­po­nents?
        I was not able to grasp the sig­nif­i­cance of “level” and “slope” that comes out of tbats.components().
        Is the decom­pose() or stl() func­tions usable in it’s place? Would these take the same para­me­ters and give proper decom­po­si­tion?
        Regards,
        KM

      • KM

        What I meant to ask was how do I com­pare trend (com­ing out of decom­pose()) and level com­ing out of tbats.components(). I have taken them as the same, Divided the orig­i­nal series with a mul­ti­pli­ca­tion of log(trend*slope*irr) to get seasonality.

        Regards,
        KM

        • http://robjhyndman.com/ Rob J Hyndman

          decom­pose() and tbats() use dif­fer­ent mod­els, so they are not strictly com­pa­ra­ble. But if your tbats model has no Box-​​Cox trans­for­ma­tion, then the trend from decom­pose is roughly equal to the level from tbats. Both func­tions already pro­duce sea­sonal com­po­nents for you.

  • Pingback: TBATS with regressors | Hyndsight()