TBATS with regressors

I’ve received a few emails about including regression variables (i.e., covariates) in TBATS models. As TBATS models are related to ETS models, tbats() is unlikely to ever include covariates as explained here. It won’t actually complain if you include an xreg argument, but it will ignore it.

When I want to include covariates in a time series model, I tend to use auto.arima() with covariates included via the xreg argument. If the time series has multiple seasonal periods, I use Fourier terms as additional covariates. See my post on forecasting daily data for some discussion of this model. Note that fourier() and fourierf() now handle msts objects, so it is very simple to do this.

For example, if holiday contains some dummy variables associated with public holidays and holidayf contains the corresponding variables for the first 100 forecast periods, then the following code can be used:

y <- msts(x, seasonal.periods=c(7,365.25))
z <- fourier(y, K=c(2,5))
zf <- fourierf(y, K=c(2,5), h=100)
fit <- auto.arima(y, xreg=cbind(z,holiday), seasonal=FALSE)
fc <- forecast(fit, xreg=cbind(zf,holidayf), h=100)

The main disadvantage of the ARIMA approach is that the seasonality is forced to be periodic, whereas a TBATS model allows for dynamic seasonality.

Related Posts:

  • Stephan Kolassa

    Thanks for clarifying this. Are you considering including this information in the tbats() help page? Right now, digging through it, one could be forgiven for believing one could specify an xreg parameter that would be handed through to auto.arima(): http://stats.stackexchange.com/questions/116756/is-there-any-way-to-include-regressors-in-tbats-function-in-r/

    • The help file has now been updated (on github — it will be on CRAN in the next version).

  • F. Nelson

    I noticed that in this example, you set the seasonal parameter of the auto.arima function to FALSE but in your post on forecasting daily data, you didn’t do this and the default is set to TRUE. Is this because you’re using msts to capture the two seasonal periods with a fourier basis?

    • You should always use `seasonal=FALSE` if you use Fourier terms. I’ve corrected the other post.

  • B Bhattacharjee

    Dear Sir,

    I am trying to forecast store sales based on daily data. There are over 1000 stores like that. Some peculiar properties of the data are:
    1. for 90% of the stores the sales are 0 for Sunday, as the stores remain closed.
    2. in case of holidays also, most of the stores are closed and hence the sales figures for those days are zero as well.
    3. There are indicators like promotional offers which have a high end effect on sales so they must be included in the model else it performs quite poorly.
    4. Most of the store reflects a bi-weekly seasonality while there is also a strong yearly seasonality for many stores.

    — In this juncture, it comes out to be I can only use covariates with the arima familiy of models in R and not with ets or more advanced tbats().
    Now I am trying to do validate my arima() model with the fourier terms included to account for the yearly seasonality, but it comes out to be that for some stores when I include the fourier terms the out-of-sample predictions become worse while for some of the other stores the fourier terms significantly improve the forecasts. One more thing is that the out-of -sample prediction also depends on the K-parameters i choose for the fourier terms and for different store the optimal choice of K differs.

    So far it seems that there is loosely any chance for automating the forecast for all of the stores (like running a loop – well I can run a loop with a grid search for fourier/non-fourier and in case fourier for optimizing the forecasts for different values of K) – but you see with a large number of stores this loop will take like forever as the arima() by itself is not very fast confronted with some complicated series (turning the approximation = TRUE worsens the prediction). I would be very thankful to you in this regard if you could kindly reflect a thought of efficiently dealing with this data. The intermittent zeros being also a problem to the accuracy of the algorithm. Thanks for your time.

  • JoseStenio

    Just one silly question. What are the post estimation procedures for TBATS? Are they the same as ARIMA or is there any other factor that we must be aware of?

    • Always check the residuals — the ACF is useful, but make sure you include at least as many lags as the longest seasonal period.

      • Elias C.

        I’ve seen additionnal arguments like auto.arima in TBATS model. Is there a way to correct the ARMA terms given by TBATS don’t match with the ACF and PACF??

        • Elias C.

          I’ve seen additionnal arguments like auto.arima in TBATS model. Is there a way to correct the ARMA terms given by TBATS when they don’t match with the ACF and PACF??

        • No. tbats() is entirely automatic.

          • Elias

            Ok sir. when ther ARMA terms given by TBATS and ACF doesn’t match, does that mean the model is not appropriate?

          • The ARMA terms are for the errors, not the data. I suspect you are not looking at the ACF of the errors, as they are not returned by the function. Check the residuals. If they look like white noise, the model is probably fine.

  • Steev

    Sir, I want to remove holidays effect in daily data before using TBATS. Can I use auto.arima for it? could you please give me a reference which shows the related command for that? Thank you

    • You could try using auto.arima with Fourier terms for the seasonality and dummy variables for the holidays. I’m not sure whether it would work as it depends on the nature of your data. There is no reference other than the general discussion of dynamic regression models at http://otexts.org/fpp/9/1

      • Pratik Sawant

        Hello Sir,

        I am writing this to ask for help. I have a dataset where I am trying to predict daily sales for one year and I have 4 years of data. I used the same methodology as you have mentioned where I built harmonic regression model with ARMA errors. But how do we add holidays effect as a covariate into it? I am pretty new to forecasting and using models so I don’t know how to create a dummy variable. I tried creating a variable by assigning dates with holidays as 1 and days without it as 0.

        And when I am trying to run this line: “fit1 <- auto.arima(retailtimeseries_daily_tbats, lambda=0, xreg=cbind(harmonics,holidaysCurrent), seasonal=FALSE)"
        it is giving me this error:

        "Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, …) : NA/NaN/Inf in 'y'"

        Can anyone help me with this? Thank you

        • Please ask your questions on crossvalidated.com or stackoverflow.com. This is not a help site. Restrict comments to my blog entries.

  • Tim Ka

    I get an error:
    >fourier(y, K=c(5,5))
    >K must be not be greater than period/2

    I have set the same seasonal periods for the time series.

    • Now fixed. The message should have been enough for you to figure it out in any case.