# Forecasting with daily data

I’ve had several emails recently asking how to forecast daily data in R. Unless the time series is very long, the easiest approach is to simply set the frequency attribute to 7.

 y <- ts(x, frequency=7)

Then any of the usual time series forecasting methods should produce reasonable forecasts. For example

 library(forecast) fit <- ets(y) fc <- forecast(fit) plot(fc)

When the time series is long enough to take in more than a year, then it may be necessary to allow for annual seasonality as well as weekly seasonality. In that case, a multiple seasonal model such as TBATS is required.

 y <- msts(x, seasonal.periods=c(7,365.25)) fit <- tbats(y) fc <- forecast(fit) plot(fc)

This should capture the weekly pattern as well as the longer annual pattern. The period 365.25 is the average length of a year allowing for leap years. In some countries, alternative or additional year lengths may be necessary. For example, with the Turkish electricity data analysed in De Livera et al (JASA 2011), we used three seasonal periods: 7, 354.35 and 365.25. The period 354.37 is the average length of the Islamic calendar.

Capturing seasonality associated with moving events such as Easter or the Chinese New Year is more difficult. Even with monthly data, this can be tricky as the festivals can fall in either March or April (for Easter) or in January or February (for the Chinese New Year). The usual seasonal models don’t allow for this, and even the complex seasonality discussed in my JASA paper assumes that the seasonal patterns occur at the same time in each year. The best way to deal with moving holiday effects is to use dummy variables. However, neither ETS nor TBATS models allow for covariates. A state space model of the same form as TBATS but with multiple sources of error and covariates could be used, but I don’t have any R code to do that.

Instead, I would use a regression model with ARIMA errors, where the regression terms include any dummy holiday effects as well as the longer annual seasonality. Unless there are many decades of data, it is usually reasonable to assume that the annual seasonal shape is unchanged from year to year, and so Fourier terms can be used to model the annual seasonality. Suppose we use $K=5$ Fourier terms to model annual seasonality, and that the holiday dummy variables are in the vector holiday with 100 future values in holidayf. Then the following code will fit an appropriate model.

 y <- ts(x, frequency=7) z <- fourier(ts(x, frequency=365.25), K=5) zf <- fourierf(ts(x, frequency=365.25), K=5, h=100) fit <- auto.arima(y, xreg=cbind(z,holiday), seasonal=FALSE) fc <- forecast(fit, xreg=cbind(zf,holidayf), h=100)

The order $K$ can be chosen by minimizing the AIC of the fitted model.

### Related Posts:

• Kris Ewican

This is very useful Rob. Thank you very much!

• mike

Thanks for the post. Couple of questions: 1) When using regression with ARMA with Fourier explanatory variables, do you go b back and remove the individual Fourier series that are are insignificant? 2) can you use xreg= with TBATS?

• 1. No. I use the AIC to determine K, and leave all the terms in. Significance is not as important as being useful for prediction, and they are not the same thing.

2. No.

• Pingback: Somewhere else, part 76 | Freakonometrics()

• Richard Warnung

Very nice summary, very useful, thanks for all your efforts and best wishes from Vienna.

• Leo

Professor Hyndman,
What will be the seasonality of hourly data that’s available for weekdays (5 days of week) or daily data that’s available for five days of the week?
regards
Leo

• Leo

I meant seasonal period which I guess should be 5 for weekdays data.

• harvey chaparro

how to create the holiday and holidayf vectors?

• Anthony

Do you think ets or tbats are useful for analyzing daily stock market data?

Moreover, I am using tbats but R is running the bats command. At least that’s what it shows when I enter the datatbats (my tbats object) at the R prompt.

• No. These functions are for data that have trends and seasonality. Daily stock market data typically have neither.

When you call tbats() with a non seasonal time series, it will return a non-seasonal BATS model as that is equivalent to a non-seasonal TBATS model.

• Anthony

Stock market data does have a trend. Perhaps simple Holt’s method can produce something good.

• Wrong. Stock market data has some apparent local trends that are simply the local effects of random walk like behaviour. Holt’s method will give useless forecasts for daily stock data.

• A.M.jaber

Hi
i would like to ask how I can extract and forecasting trend using Empirical Mode Decomposition with R code for daily stock market data

• Abhishek

Respected Sir,

I am a student of computer science and currently I am working on my project.You are the expert of this field and I have seen on your blog that you help everyone so it is so kind of you.

I am working on forecasting of the energy consumption in R . I do have the data of previous 30 years and I want to forecast the data of next 5 years. Can you suggest the best way to do it ?

Thank you.

• Ville

Hi, one can determine the parameters alpha, gamma etc or give upper and lower bounds for them when using ETS. Can you fix the values or give boundaries to the parameters when using BATS/TBATS? Thanks

• Yes. Read the help file for ets().
No. BATS/TBATS models are currently only available using a completely automated procedure. We may introduce more manual model specification in a later version.

• Ville

Thank you for your reply. I think that more manual model specifications in BATS/TBATS would be a great improvement and I’m looking forward to seeing it in the future!

• KM

Mr. Hyndman, let me first thank you immensely for teaching me so much about forecasting. I have taken up a few courses and worked at two leading firms in the past but the amount I’ve learnt from your blog posts is far more.

I am trying to learn more forecasting using R currently to aide in my masters course. I got a time series data (for 5 years) from a third party fmcg data vendor which has weekly and monthly seasonality which I could see by using the decompose() function.

I am trying to forecast using the code below:

mydata <- msts(mydata1, seasonal.periods=c(7,365.25,354.37,365));
fit <- tbats(mydata,use.box.cox=NULL, use.trend=NULL, use.damped.trend=NULL,seasonal.periods=c(7,365.25,365,354.37), use.arma.errors=TRUE, use.parallel=TRUE,bc.lower=0, bc.upper=1);
fcast <- forecast(fit,h=433);
plot(fcast);
fcast.df <- data.frame(fcast)
WriteXLS("fcast.df","E:/fcast.xlsx");
where I have used weekly, Gregorian, Hindu and Hijri calendars to set seasonal periods.
The correlation between the forecasted data and observed data is ~0.82 which seems low to me. The major reason could be that I can see peaks on a few particular dates like Jan1 and Dec25 year on year which is not forecasted by tbats.
Is there a way to include this into the code? What is the significance of manually setting the box-cox limits?
Regards,
KM

• The tbats model does not allow for covariates, so specific effects such as Christmas and New Year cannot be handled by dummy variables. However, you could use the Fourier-ARIMA approach mentioned above, and add the covariate as I’ve explained. The Box-Cox parameter is normally restricted to (0,1). The arguments allow other ranges which are occasionally useful.

• ninnawei

how to use the holiday vector specific? could you take an example?

• KM

Thank you Mr. Hyndman for the explanation. The data has a weekly and monthly seasonality and requires me to use TBATS as you said here: http://www.r-bloggers.com/forecasting-weekly-data/
Is there a way to decompose using tbats.components() to get trend, seasonal components and irregular components?
I was not able to grasp the significance of “level” and “slope” that comes out of tbats.components().
Is the decompose() or stl() functions usable in it’s place? Would these take the same parameters and give proper decomposition?
Regards,
KM

• KM

What I meant to ask was how do I compare trend (coming out of decompose()) and level coming out of tbats.components(). I have taken them as the same, Divided the original series with a multiplication of log(trend*slope*irr) to get seasonality.

Regards,
KM

• decompose() and tbats() use different models, so they are not strictly comparable. But if your tbats model has no Box-Cox transformation, then the trend from decompose is roughly equal to the level from tbats. Both functions already produce seasonal components for you.

• fei Li

Hi Professor Hyndman, What if there is Box-Cox transformation, how could we get the trend value from tbats model?

And is the slope the random error item?

• You will need to back-transform the trend to get it on the original scale. No, the slope is not the random error term. Please read the documentation to understand the model.

• @robjhyndman:disqus sorry to disturb you, but I need to forecast gas consumption composed by a daily, weekly (week
days-weekend), yearly seasonality. Does it make sense to apply three
times the STL decomposition by LOESS? (http://datascience.stackexchange.com/questions/957/multiple-seasonality-with-arima)

• Pingback: TBATS with regressors | Hyndsight()

• Pingback: Multiple seasonality with ARIMA? | CL-UAT()

• Mary Rose

Hi Rob,
I am forecasting daily data and fitting it to a tbats model:

y <- msts(x, seasonal.periods=c(7,365.25))
fit <- tbats(y)

I know that the function Arima() is used when wanting to update an ARIMA model whenever new data is available. Is there a function in R that will do the same for a tbats model (i.e. update the tbats model for new incoming data)?
Thanks,
Mary Rose

• Not yet. It’s on my to do list.

• Mary Rose

Sounds good.

I ended up doing the following:

y <- ts(x, frequency=7)
z <- fourier(ts(x, frequency=365.25), K=5)
fit <- auto.arima(y, xreg=z, seasonal=TRUE)

# new_x is a vector of newly observed daily data with length h
new_y <- ts(new_x, frequency = 7)
new_z <- fourierf(ts(new_x, frequency=365.25), K=5, h))

update <- Arima(x=c(y,new_y), xreg = c(z,new_z), model = fit, seasonal=TRUE)

Would this be a good way to update a daily model when wanting to include both week and annual frequencies?

Thanks,

Mary Rose

• Yes, that should work very well.

• Christopher

I have a data set that is daily data that has a strong weekly pattern (M-F is high traffic, weekends are low traffic).

When I do:

us=xts(usDf[,’daily’],order.by=usDf[,’date’],freq=7);
gtbats <- tbats(us); fc2 <- forecast(gtbats, h=28) ; plot(fc2)

The historical data in the plot is missing and the y-axis is incorrect. The shape of the forecast itself appears fine.

When I use the msts function, everything looks better:

y <- msts(us, seasonal.periods=c(7))
fit <- tbats(y); fc <- forecast(fit,h=28); plot(fc)

Just a note that at first, the plot without msts() appears wildly incorrect, but the forecast itself appears sound shape-wise.

How do I interpret the x-axis? It doesn't appear to correspond to days…

The data spans several years. There are no visually obvious yearly patterns. But when I add seasonal.periods=c(7,29.6,365.25), the prediction is more nuanced. I am thinking I could test for the monthly seasonality and yearly seasonality by using less "training" data and see how it predicts matched to actual data. Is there an easier manner to determine the underlining seasonality in a time-series? (I know I have more reading to do…)

PS Thank you so very much for your blogs and your other writing and research.

• Use either ts or msts objects, as explained in the help file. Don’t use xts objects.

The x-axis is in weeks.

The model will tell you whether there is any annual seasonality.

• Gregory R. Duchon

On a related note, if you had weekday data only would you lower the frequency to 5 as opposed to have weekend values with 0 and would the second seas then become something like 365-(52*2) =261? (I am looking at support requests that only occur on weekdays as project for my MS Business Analytics program.)

• Yes, I would use frequency=5. Setting weekends to 0 will create problems in finding appropriate models.

• Gregory R. Duchon

Thank you professor!

• Pingback: Modelo ARIMA | Monolito Nimbus()

• randomdude

Hi Rob,
is there a way to use the tbats method and extract the remainder of the decomposition like you did it in your “turkey electricity demand” analysis? I asked on CrossValidated some time ago but nobody could help me with that so far: http://stats.stackexchange.com/questions/163371/work-with-results-of-tbats-decomposition

• Alassane

Hi Sir,

Thank you for all your interesting explanations. I used to forecast daily positive time series using tbats. But in some cases i get négatives or very long values like 1.25868e+14 where my real values are between 0 and 50000. I tried to include lamda=0 in my models but it doesn’t work any more. I would like to know if there is a solution to avoid that problems ?

Thank you.

Alassane

• Can you please provide a reproducible example of problems like this. I am always trying to improve the software, and edge cases that cause problems are helpful in identifying areas for improvement. You can submit bug reports at https://github.com/robjhyndman/forecast/issues

• Alassane

I submit the reports in the site. It’s about forecasting daily turnover in 87 countries by using historical datas in 4 years (from 2011 to 2015)

Thank you

• max

Hi sir Rob and thank you for the post, i have a daily time series about the number of byer on web site each day . my goal is to forecast the number of byers(visitors) for future days.

haw can i do this using R programming, how can i undestand that my time series is statonary or not ? any suggestion of R code welcome

may data contain 2 column: date and number of byers.
this mu code R and the output: it is right or not ?

date byers

1 01/01/2014 3114

2 02/01/2014 5954

3 03/01/2014 5342

4 04/01/2014 4929

5 05/01/2014 5633

6 06/01/2014 5890

[1] 3114 5954 5342 4929 5633 5890

> dftime=ts(df,start=c(2014,01),frequency=365)

> HWmodel=HoltWinters(dftime,beta=FALSE,gamma=FALSE)

> HWmodel

Holt-Winters exponential smoothing without trend and without seasonal component.

Call:

HoltWinters(x = dftime, beta = FALSE, gamma = FALSE)

Smoothing parameters:

alpha: 0.8619079

> future

Time Series:

Start = c(2015, 1)

End = c(2015, 10)

Frequency = 365

fit

[1,] 6738.195

[2,] 6738.195

[3,] 6738.195

[4,] 6738.195

[5,] 6738.195

[6,] 6738.195

[7,] 6738.195

[8,] 6738.195

[9,] 6738.195

[10,] 6738.195

beta : FALSE

gamma: FALSE

Coefficients:

[,1]

a 6738.195

if you remark always i have the same value predecterd fo 10

• max

Hi sir Rob and thank you for the post, i have a daily time series about the number of byer on web site each day . my goal is to forecast the number of byers(visitors) for future days.

haw can i do this using R programming, how can i undestand that my time series is statonary or not ? any suggestion of R code welcome

may data contain 2 column: date and number of byers.
this mu code R and the output: it is right or not ?

date byers

1 01/01/2014 3114

2 02/01/2014 5954

3 03/01/2014 5342

4 04/01/2014 4929

5 05/01/2014 5633

6 06/01/2014 5890

[1] 3114 5954 5342 4929 5633 5890

> dftime=ts(df,start=c(2014,01),frequency=365)

> HWmodel=HoltWinters(dftime,beta=FALSE,gamma=FALSE)

> HWmodel

Holt-Winters exponential smoothing without trend and without seasonal component.

Call:

HoltWinters(x = dftime, beta = FALSE, gamma = FALSE)

Smoothing parameters:

alpha: 0.8619079

> future

Time Series:

Start = c(2015, 1)

End = c(2015, 10)

Frequency = 365

fit

[1,] 6738.195

[2,] 6738.195

[3,] 6738.195

[4,] 6738.195

[5,] 6738.195

[6,] 6738.195

[7,] 6738.195

[8,] 6738.195

[9,] 6738.195

[10,] 6738.195

beta : FALSE

gamma: FALSE

Coefficients:

[,1]

a 6738.195

if you remark always i have the same value predicterd fo 10 days . why ? and how can i do this type of forcasting using R ?

Thank you very much for your help Sir.

• max

Hi and thans profsor for post. i have daily data and i like to make some forecasting. what model can I use , there are many model : ets(), arima(),auto.arima(),holtwinters()…

• Learner

I have daily demand data from past 3 years. I want to forecast for next 365 days. Month of the year and day of the week has impact on demand. How do i include month in msts? since january has 31 days, february has 28 days and April has 30 days. not to forget leap year! If day of the week also has trend, would you suggest doing below for the month or is there any other way month can be specified?
msts(x, seasonal.periods=c(7,365.25,30))

• Do you really have monthly seasonality? That would be very unusual, but it does sometimes happen. Much more common is both weekly and annual seasonality. Monthly seasonality would arise if you had end-of-month effects due to accounting practices, but I can’t think of anything else that would cause them. I think you would need to consider what is causing the monthly seasonality and try to allow for it explicitly. It is not strictly seasonal as the periodicity is not regular.

• Oli Paul

Real time bidding. Marketing agencies use pacing algorithms that try to average their spend across the month but always end up having to spend more at the end of the month – they need to spend the budget to justify it.

• larry77

Very interesting post, but can you provide a numerical example? I am not sure I understand the nature of holiday vector. Is it a vector of dates? Or it has 0/1 depending on whether a certain day is a holiday or not?

• It is a dummy variable. That means it contains 0s and 1s indicating which days are holidays.

• Arun Gunalan

HI Professor, so the holidays will be set to “0”, am i corect.

• No. 1s for holidays, 0s everywhere else.

• Raed

Very informative, Rob. Thank you so much.

• yuqin

Hello professor, what if there is a monthly period as well?

• Arun Gunalan

HI Rob, i have a retail sales data time series of 2 years, where the sales will be higher in weekends, will the model you expained above will fit it.

• Maybe.

• Dear Rob, regarding the frequency parameter in ts, will it be 7 or 365.25. As by default it is for annual ( that is 1 for annual) and 12 for monthly, logically would it be 365.25?

• Thank you very much Rob.

• Jul

Awesome post, thanks Rob!

I was asked to summarize in a couple of sentences what the tbats formula does.
Would you agree with the following summary?

1. Apply a Box-Cox transformation if Y is not normally distributed*
2. Model the seasonality with Fourier series **
3. Model the remaining auto-correlation, once the effect of seasonality has been removed, with an ARMA process **

* That’s not a guarantee that Y will be normally distributed after the transformation.
** The parameters K for the Fourier series and (p,q) for the ARMA process are chosen by minimizing the AIC of the fitted model.

Thanks!

• Not quite.

1. Y is not required to be normal, and usually won’t be due to the trend and seasonality. Instead, the residuals are assumed normal with constant variance, and the BC transformation will normally allow that to occur.

2. The seasonality is modelled with Fourier-like terms that can change coefficients over time. Also, a local linear time trend is included, as it is in an ETS model.

3. Yes

All terms are selected by minimizing the AIC, not just the ARMA orders.

• Jul

thanks Rob!

• Saneesh George

Thank you very much Professor for this post.

Can you please explain how do we forecast a future value based on the current value. For eg: what would be the total number of bookings on next Saturday if there is already x numbers at this moment.

• Steev

hello Sir. how can I have the p-value of coefficients. It would be interesting to justify that the holidays effects are significant.

• You have the coefficients, and the standard errors. It is not too hard to compute the p-values yourself.