Forecasting with daily data

I’ve had sev­eral emails recently ask­ing how to fore­cast daily data in R. Unless the time series is very long, the eas­i­est approach is to sim­ply set the fre­quency attribute to 7.

y <- ts(x, frequency=7)

Then any of the usual time series fore­cast­ing meth­ods should pro­duce rea­son­able fore­casts. For example

library(forecast)
fit <- ets(y)
fc <- forecast(fit)
plot(fc)

When the time series is long enough to take in more than a year, then it may be nec­es­sary to allow for annual sea­son­al­ity as well as weekly sea­son­al­ity. In that case, a mul­ti­ple sea­sonal model such as TBATS is required.

y <- msts(x, seasonal.periods=c(7,365.25))
fit <- tbats(y)
fc <- forecast(fit)
plot(fc)

This should cap­ture the weekly pat­tern as well as the longer annual pat­tern. The period 365.25 is the aver­age length of a year allow­ing for leap years. In some coun­tries, alter­na­tive or addi­tional year lengths may be nec­es­sary. For exam­ple, with the Turk­ish elec­tric­ity data analysed in De Liv­era et al (JASA 2011), we used three sea­sonal peri­ods: 7, 354.35 and 365.25. The period 354.37 is the aver­age length of the Islamic calendar.

Cap­tur­ing sea­son­al­ity asso­ci­ated with mov­ing events such as Easter or the Chi­nese New Year is more dif­fi­cult. Even with monthly data, this can be tricky as the fes­ti­vals can fall in either March or April (for Easter) or in Jan­u­ary or Feb­ru­ary (for the Chi­nese New Year). The usual sea­sonal mod­els don’t allow for this, and even the com­plex sea­son­al­ity dis­cussed in my JASA paper assumes that the sea­sonal pat­terns occur at the same time in each year. The best way to deal with mov­ing hol­i­day effects is to use dummy vari­ables. How­ever, nei­ther ETS nor TBATS mod­els allow for covari­ates. A state space model of the same form as TBATS but with mul­ti­ple sources of error and covari­ates could be used, but I don’t have any R code to do that.

Instead, I would use a regres­sion model with ARIMA errors, where the regres­sion terms include any dummy hol­i­day effects as well as the longer annual sea­son­al­ity. Unless there are many decades of data, it is usu­ally rea­son­able to assume that the annual sea­sonal shape is unchanged from year to year, and so Fourier terms can be used to model the annual sea­son­al­ity. Sup­pose we use K=5 Fourier terms to model annual sea­son­al­ity, and that the hol­i­day dummy vari­ables are in the vec­tor holiday with 100 future val­ues in holidayf. Then the fol­low­ing code will fit an appro­pri­ate model.

y <- ts(x, frequency=7)
z <- fourier(ts(x, frequency=365.25), K=5)
zf <- fourierf(ts(x, frequency=365.25), K=5, h=100)
fit <- auto.arima(y, xreg=cbind(z,holiday), seasonal=FALSE)
fc <- forecast(fit, xreg=cbind(zf,holidayf), h=100)

The order K can be cho­sen by min­i­miz­ing the AIC of the fit­ted model.


Related Posts:


  • Kris Ewican

    This is very use­ful Rob. Thank you very much!

  • Pingback: Momento R do Dia | De Gustibus Non Est Disputandum()

  • mike

    Thanks for the post. Cou­ple of ques­tions: 1) When using regres­sion with ARMA with Fourier explana­tory vari­ables, do you go b back and remove the indi­vid­ual Fourier series that are are insignif­i­cant? 2) can you use xreg= with TBATS?

    • http://robjhyndman.com/ Rob J Hyndman

      1. No. I use the AIC to deter­mine K, and leave all the terms in. Sig­nif­i­cance is not as impor­tant as being use­ful for pre­dic­tion, and they are not the same thing.

      2. No.

  • Pingback: Somewhere else, part 76 | Freakonometrics()

  • Richard War­nung

    Very nice sum­mary, very use­ful, thanks for all your efforts and best wishes from Vienna.

  • Leo

    Pro­fes­sor Hyn­d­man,
    What will be the sea­son­al­ity of hourly data that’s avail­able for week­days (5 days of week) or daily data that’s avail­able for five days of the week?
    regards
    Leo

    • Leo

      I meant sea­sonal period which I guess should be 5 for week­days data.

  • Pingback: Rtistry & Rbominations :: linklist, sunday 2013-09-29()

  • har­vey chaparro

    how to cre­ate the hol­i­day and hol­i­dayf vectors?

  • Anthony

    Do you think ets or tbats are use­ful for ana­lyz­ing daily stock mar­ket data?

    More­over, I am using tbats but R is run­ning the bats com­mand. At least that’s what it shows when I enter the datat­bats (my tbats object) at the R prompt.

    • http://robjhyndman.com/ Rob J Hyndman

      No. These func­tions are for data that have trends and sea­son­al­ity. Daily stock mar­ket data typ­i­cally have neither.

      When you call tbats() with a non sea­sonal time series, it will return a non-​​seasonal BATS model as that is equiv­a­lent to a non-​​seasonal TBATS model.

      • Anthony

        Stock mar­ket data does have a trend. Per­haps sim­ple Holt’s method can pro­duce some­thing good.

        • http://robjhyndman.com/ Rob J Hyndman

          Wrong. Stock mar­ket data has some appar­ent local trends that are sim­ply the local effects of ran­dom walk like behav­iour. Holt’s method will give use­less fore­casts for daily stock data.

  • A.M.jaber

    Hi
    i would like to ask how I can extract and fore­cast­ing trend using Empir­i­cal Mode Decom­po­si­tion with R code for daily stock mar­ket data

  • Abhishek

    Respected Sir,

    I am a stu­dent of com­puter sci­ence and cur­rently I am work­ing on my project.You are the expert of this field and I have seen on your blog that you help every­one so it is so kind of you.

    I am work­ing on fore­cast­ing of the energy con­sump­tion in R . I do have the data of pre­vi­ous 30 years and I want to fore­cast the data of next 5 years. Can you sug­gest the best way to do it ?

    Thank you.

  • Ville

    Hi, one can deter­mine the para­me­ters alpha, gamma etc or give upper and lower bounds for them when using ETS. Can you fix the val­ues or give bound­aries to the para­me­ters when using BATS/​TBATS? Thanks

    • http://robjhyndman.com/ Rob J Hyndman

      Yes. Read the help file for ets().
      No. BATS/​TBATS mod­els are cur­rently only avail­able using a com­pletely auto­mated pro­ce­dure. We may intro­duce more man­ual model spec­i­fi­ca­tion in a later version.

      • Ville

        Thank you for your reply. I think that more man­ual model spec­i­fi­ca­tions in BATS/​TBATS would be a great improve­ment and I’m look­ing for­ward to see­ing it in the future!

  • KM

    Mr. Hyn­d­man, let me first thank you immensely for teach­ing me so much about fore­cast­ing. I have taken up a few courses and worked at two lead­ing firms in the past but the amount I’ve learnt from your blog posts is far more.

    I am try­ing to learn more fore­cast­ing using R cur­rently to aide in my mas­ters course. I got a time series data (for 5 years) from a third party fmcg data ven­dor which has weekly and monthly sea­son­al­ity which I could see by using the decom­pose() function.

    I am try­ing to fore­cast using the code below:

    mydata1 <-read.csv(“E:/file.csv”,header=TRUE);
    mydata <- msts(mydata1, seasonal.periods=c(7,365.25,354.37,365));
    fit <- tbats(mydata,use.box.cox=NULL, use.trend=NULL, use.damped.trend=NULL,seasonal.periods=c(7,365.25,365,354.37), use.arma.errors=TRUE, use.parallel=TRUE,bc.lower=0, bc.upper=1);
    fcast <- forecast(fit,h=433);
    plot(fcast);
    fcast.df <- data.frame(fcast)
    WriteXLS(“fcast.df”,“E:/fcast.xlsx”);
    where I have used weekly, Gre­go­rian, Hindu and Hijri cal­en­dars to set sea­sonal peri­ods.
    The cor­re­la­tion between the fore­casted data and observed data is ~0.82 which seems low to me. The major rea­son could be that I can see peaks on a few par­tic­u­lar dates like Jan1 and Dec25 year on year which is not fore­casted by tbats.
    Is there a way to include this into the code? What is the sig­nif­i­cance of man­u­ally set­ting the box-​​cox lim­its?
    Regards,
    KM

    • http://robjhyndman.com/ Rob J Hyndman

      The tbats model does not allow for covari­ates, so spe­cific effects such as Christ­mas and New Year can­not be han­dled by dummy vari­ables. How­ever, you could use the Fourier-​​ARIMA approach men­tioned above, and add the covari­ate as I’ve explained. The Box-​​Cox para­me­ter is nor­mally restricted to (0,1). The argu­ments allow other ranges which are occa­sion­ally useful.

      • nin­nawei

        how to use the hol­i­day vec­tor spe­cific? could you take an example?

      • KM

        Thank you Mr. Hyn­d­man for the expla­na­tion. The data has a weekly and monthly sea­son­al­ity and requires me to use TBATS as you said here: http://​www​.​r​-blog​gers​.com/​f​o​r​e​c​a​s​t​i​n​g​-​w​e​e​k​l​y​-​data/
        Is there a way to decom­pose using tbats.components() to get trend, sea­sonal com­po­nents and irreg­u­lar com­po­nents?
        I was not able to grasp the sig­nif­i­cance of “level” and “slope” that comes out of tbats.components().
        Is the decom­pose() or stl() func­tions usable in it’s place? Would these take the same para­me­ters and give proper decom­po­si­tion?
        Regards,
        KM

      • KM

        What I meant to ask was how do I com­pare trend (com­ing out of decom­pose()) and level com­ing out of tbats.components(). I have taken them as the same, Divided the orig­i­nal series with a mul­ti­pli­ca­tion of log(trend*slope*irr) to get seasonality.

        Regards,
        KM

        • http://robjhyndman.com/ Rob J Hyndman

          decom­pose() and tbats() use dif­fer­ent mod­els, so they are not strictly com­pa­ra­ble. But if your tbats model has no Box-​​Cox trans­for­ma­tion, then the trend from decom­pose is roughly equal to the level from tbats. Both func­tions already pro­duce sea­sonal com­po­nents for you.

          • fei Li

            Hi Pro­fes­sor Hyn­d­man, What if there is Box-​​Cox trans­for­ma­tion, how could we get the trend value from tbats model?

            And is the slope the ran­dom error item?

            Thank you for your time!

          • http://robjhyndman.com/ Rob J Hyndman

            You will need to back-​​transform the trend to get it on the orig­i­nal scale. No, the slope is not the ran­dom error term. Please read the doc­u­men­ta­tion to under­stand the model.

  • http://www.marcodena.it/ Marco De Nadai

    @robjhyndman:disqus sorry to dis­turb you, but I need to fore­cast gas con­sump­tion com­posed by a daily, weekly (week
    days-​​weekend), yearly sea­son­al­ity. Does it make sense to apply three
    times the STL decom­po­si­tion by LOESS? (http://​data​science​.stack​ex​change​.com/​q​u​e​s​t​i​o​n​s​/​9​5​7​/​m​u​l​t​i​p​l​e​-​s​e​a​s​o​n​a​l​i​t​y​-​w​i​t​h​-​arima)

    • http://robjhyndman.com/ Rob J Hyndman

      Ask your ques­tions on cross​val​i​dated​.com.

  • Pingback: TBATS with regressors | Hyndsight()

  • Pingback: Multiple seasonality with ARIMA? | CL-UAT()

  • Mary Rose

    Hi Rob,
    I am fore­cast­ing daily data and fit­ting it to a tbats model:

    y <- msts(x, seasonal.periods=c(7,365.25))
    fit <- tbats(y)

    I know that the func­tion Arima() is used when want­ing to update an ARIMA model when­ever new data is avail­able. Is there a func­tion in R that will do the same for a tbats model (i.e. update the tbats model for new incom­ing data)?
    Thanks,
    Mary Rose

    • http://robjhyndman.com/ Rob J Hyndman

      Not yet. It’s on my to do list.

      • Mary Rose

        Sounds good.

        I ended up doing the following:

        y <- ts(x, frequency=7)
        z <- fourier(ts(x, frequency=365.25), K=5)
        fit <- auto.arima(y, xreg=z, seasonal=TRUE)

        # new_​x is a vec­tor of newly observed daily data with length h
        new_​y <- ts(new_x, fre­quency = 7)
        new_​z <- fourierf(ts(new_x, frequency=365.25), K=5, h))

        update <- Arima(x=c(y,new_y), xreg = c(z,new_z), model = fit, seasonal=TRUE)

        Would this be a good way to update a daily model when want­ing to include both week and annual frequencies?

        Thanks,

        Mary Rose

        • http://robjhyndman.com/ Rob J Hyndman

          Yes, that should work very well.

  • Christo­pher

    I have a data set that is daily data that has a strong weekly pat­tern (M-​​F is high traf­fic, week­ends are low traffic).

    When I do:

    us=xts(usDf[,‘daily’],order.by=usDf[,‘date’],freq=7);
    gtbats <- tbats(us); fc2 <- forecast(gtbats, h=28) ; plot(fc2)

    The his­tor­i­cal data in the plot is miss­ing and the y-​​axis is incor­rect. The shape of the fore­cast itself appears fine.

    When I use the msts func­tion, every­thing looks better:

    y <- msts(us, seasonal.periods=c(7))
    fit <- tbats(y); fc <- forecast(fit,h=28); plot(fc)

    Just a note that at first, the plot with­out msts() appears wildly incor­rect, but the fore­cast itself appears sound shape-​​wise.

    How do I inter­pret the x-​​axis? It doesn’t appear to cor­re­spond to days…

    The data spans sev­eral years. There are no visu­ally obvi­ous yearly pat­terns. But when I add seasonal.periods=c(7,29.6,365.25), the pre­dic­tion is more nuanced. I am think­ing I could test for the monthly sea­son­al­ity and yearly sea­son­al­ity by using less “train­ing” data and see how it pre­dicts matched to actual data. Is there an eas­ier man­ner to deter­mine the under­lin­ing sea­son­al­ity in a time-​​series? (I know I have more read­ing to do…)

    PS Thank you so very much for your blogs and your other writ­ing and research.

    • http://robjhyndman.com/ Rob J Hyndman

      Use either ts or msts objects, as explained in the help file. Don’t use xts objects.

      The x-​​axis is in weeks.

      The model will tell you whether there is any annual seasonality.