# Forecasting with long seasonal periods

I am often asked how to fit an ARIMA or ETS model with data having a long seasonal period such as 365 for daily data or 48 for half-hourly data. Generally, seasonal versions of ARIMA and ETS models are designed for shorter periods such as 12 for monthly data or 4 for quarterly data.

The ets() function in the forecast package restricts seasonality to be a maximum period of 24 to allow hourly data but not data with a larger seasonal frequency. The problem is that there are $m-1$ parameters to be estimated for the initial seasonal states where $m$ is the seasonal period. So for large $m$, the estimation becomes almost impossible.

The arima() function will allow a seasonal period up to $m=350$ but in practice will usually run out of memory whenever the seasonal period is more than about 200. I haven’t dug into the code to find out why this is the case — theoretically it would be possible to have any length of seasonality as the number of parameters to be estimated does not depend on the seasonal order. However, seasonal differencing of very high order does not make a lot of sense — for daily data it involves comparing what happened today with what happened exactly a year ago and there is no constraint that the seasonal pattern is smooth.

For such data I prefer a Fourier series approach where the seasonal pattern is modelled using Fourier terms with short-term time series dynamics allowed in the error. For example, consider the following model:

$$y_t = a + \sum_{k=1}^K \left[ \alpha_k\sin(2\pi kt/m) + \beta_k\cos(2\pi kt/m)\right] + N_t,$$

where $N_t$ is an ARIMA process. The value of $K$ can be chosen by minimizing the AIC.

This is easily fitted using R. Here is an example with $m=200$, $K=4$ and $N_t$ an ARIMA(2,0,1) process:

 n <- 2000 m <- 200 y <- ts(rnorm(n) + (1:n)%%100/30, f=m)   library(forecast) fit <- Arima(y, order=c(2,0,1), xreg=fourier(y, K=4)) plot(forecast(fit, h=2*m, xreg=fourierf(y, K=4, h=2*m)))

The advantages of this approach are:

• it allows any length seasonality;
• for data with more than one seasonal period, you can include Fourier terms of different frequencies;
• the seasonal pattern is smooth for small values of $K$ (but more wiggly seasonality can be handled by increasing $K$);
• the short-term dynamics are easily handled with a simple ARMA error.

The only real disadvantage (compared to a seasonal ARIMA model) that I can think of is that the seasonality is assumed to be fixed — the pattern is not allowed to change over time. But in practice, seasonality is usually remarkably constant so this is not a big disadvantage except for very long time series.

The order of $N_t$ can also be chosen automatically:

 fit <- auto.arima(y, seasonal=FALSE, xreg=fourier(y, K=4))

Note that the ARIMA model for $N_t$ should be non-seasonal.

A state space approach to the problem is possible using TBATS models, as described in this paper on complex seasonality. The tbats() function which implements TBATS models will automatically select the Fourier order as well as the other aspects of the model. One advantage of the TBATS model is the seasonality is allowed to change slowly over time.

### Related Posts:

• Matteo Bertini

The 350 limit in R is in arima.c: the array rbar can be very big.

 SEXP getQ0(SEXP sPhi, SEXP sTheta) /* NB: nrbar could overflow */ int r = max(p, q + 1); size_t np = r * (r + 1) / 2, nrbar = np * (np - 1) / 2; /* This is the limit using an int index. We could use size_t and get more on a 64-bit system, but there seems no practical need.*/ if(r > 350) error(_("maximum supported lag is 350")); rbar = (double *) R_alloc(nrbar, sizeof(double)); 
The problem with m=350 is that in some domains, like traffic forecasting, is quite common to have 4 samples per hour and a weekly seasonality, resulting in m=672.
I think the matrix is related to the Kalman filter, but I’ve found no way to switch to a lighter method.

• Erhan

Dear Mr. Hyndman,
I am working on a thesis right now, to use ARIMA-GARCH modeling approach to model a data with multiple seasonal pattern ( daily and weekly ). After a search on internet, I downloaded your forecast package. However, I could not find out how to model multiple seasonality using R with your package ? auto.arima( etc. ) does not find multiple seasonality and I guess I cound not use fourier() function effectively. Can you please lead me a way to model multiple seasonality in R.
Best Regards,
Erhan

• Try the bats() and tbats() functions.

• Guo

I am still confusing about the multiple seasonal period model. How should I do if I want to fit a model ARIMA(2,0,2)*(1,0,0)24*(1,0,1)168? Many thanks!

• R does not have the facilities to fit that model.

• Anthony

Dr. Hyndman,
What other software have that facility.
Thanks
Anthony

• I don’t know

• Rahul

Try WEKA 3.7.11

• Ahmet Gurel

Hello Mr. Hyndman,

Thank you for all your work and blog posts. I read your paper (on complex seasonality) but I am confused at a simple point. What is the exact difference between an ARIMA(p,d,q) with season S with and Arima(p,d,q,P,D,Q,S). I believe the latter components are essentially have the same interpretation with p,d,q with lag S. Is this correct? And also, I was wondering if there is a particular reason why you don’t consider these models in the paper as additional parameters of tbats or bats?

Thank you very much,

Ahmet

• 1. Read http://otexts.com/fpp/8/9/ for seasonal ARIMA models.

2. I have no software for fitting multiple period seasonal ARIMA models.

• Ahmet Gurel

Thank you very much, that was helpful.

• Ahmet Gurel

Hello again Dr.
Hyndman ,

I have again a quick question for you. While reading your book, I was stuck at a very simple point.

Lets say I have data composed of 10 observations and I want to predict the next 5 observations. (arimaForecast(data, h=5)). Then I take the the first prediction (at time 10+1) and add it to data. Then I predict the next 4 observations ((arimaForecast(append(data,forecast11) , h=4)).

Shouldn’t these forecasts be the same since I did replace yhat as the algorithms do? Actually this question is not arima specific, but for all the forecasting algortihms in Forecast package.

Thank you very much

Ahmet

• For a linear model (such as ARIMA), the point forecasts should be the same (provided the ARIMA model does not change), although the forecast variances will change.

• Arzu

Dr. Hyderman,

Is it normal that for a double seasonal ts with frequency 144 (1000 observations)-if I specify seasonal periods 144 and 1008, tbats and bats functions do not converge in reasonable time limits (i tried to run them for 3 hrs). Is it because of auto.arima function? Even if I did specify parameters such as max.p etc. I still could not get an answer. Do you have any comments regarding the cause?

• The functions are slow with high seasonal periods and a long time series. I’m trying to figure out how to make them faster.

• Arzu

Thank you for clarification.

• Dr. Hyndman,

In your above code, what I’m not understanding is,when you are forecasting, why in the fourier function your are passing (n+1):(2*m)? Can you kindly help?

• They are the time periods I want to forecast, except there is an error in your question. The code as given in my post is n+1:(2*m).

• Thanks…

• Giorgio

Dear Rob,

in case of an ARIMAX model, would you advise applying the fourier deseasonalization also to the X part, considering also that each X covariate can generally have its own seasonality?

Thanks for the clarification, best regards

• Maybe. If the covariates are highly seasonal, then seasonally adjusting them will prevent collinearity with the fourier terms. But it changes the interpretation of the model. If you have highly seasonal covariates, you may not need to user Fourier terms at all, as the seasonality in the response variable may be entirely driven by the seasonality in the covariates.

• Tim Beige

Do you mean, if I have got two time series with (same) seasonality, I can estimate a ARIMAX model without differencing the time series, because the seasonality of the response variable is driven by the xreg argument? I thought a time series in an arimax must be stationary. A time series with seasonality does not satisfy this conditon.

• If the regressors are deterministic (such as Fourier terms), then you only require the error series to be stationary, not the response variable.

• Pingback: Handling Long Seasonality in a Time Series | Jeffrey Chin Tang Wong()

• Marco Powers

Hi Professor Hyndman-

Question — why did you select K=4 in your above code? Is there an automatic process in R used to select K?

• It was just an example. In practice, you can select K using the AIC.

• olivier des

Hello Rob,

So K depends on the minimum AIC, but is there any formula or procedure to calculate it directly?

Thank you.

• No. Just compute the AIC for a few models and select the model with the smallest AIC value.

• Nilabhra Banerjee

Dr Hyndman, I would like to draw your attention to this link (http://stats.stackexchange.com/questions/103413/what-is-k-in-fourier-function-of-r ). One of the suggestions in the given link advises to use McSpatial package to get a good value for K. I would request you to comment on this.

• I’ve never used it. I prefer to minimize the AICc to choose K, but cross-validation will give almost the same results with a lot more effort.

• Herimanitra

Dear Prof Hyndman,

I don’t know if this is the right post to post this question. I have already tried some googling approach and saw your StackOverflow but I don’t find solution(s) yet.
Actually, I’m experiencing problem to set up a daily time series for the purpose of prediction with “forecast” package:

train$date=ts(train$date,start=1994,f=365)
#where train is a dataframe, “date” is the name of the time series period with numeric format, for example 19940101=1994/01/01 , etc.
It seems to work (because no error raises during set up) but auto.arima() can’t detect seasonnality after…
using zoo package doesn’t solve the problem as forecast() doesn’t work with it.

Could you please provide the correct setting I need to do?

• Herimanitra

In case where it is impossible (to set up daily time series and forecast with the “forecast” package),
I was thinking of extracting monthly values (as, argued, in your 1st paragraph) from the original train and then use the “forecast” package. After obtaining predictions, I was thinking of temporal disagregation (Chow-Lin, DENTON) to recover the prediction for each day between predicted months

• Again, actually reading the article you are commenting on would be a very good place to start. You can fit a model using Fourier series. An alternative is tbats().

• auto.arima will not fit a seasonal model with period greater than 350. Please read the article above.

• Herimanitra

OK, thanks, so if really understand the article, if I had 3years with 365days (for example, the 2 first year) and 366days (for the last year), I can specify 02 Fourier Series like this?
period1=365
period2=366
fourier (…) {

X[,2*i-1] <- sin(2*pi*i*t/period1)
X[,2*i] <- cos(2*pi*i*t/period1)
X[,2*i+1] <- sin(2*pi*i*t/period2)
X[,2*i+2] <- cos(2*pi*i*t/period2)

}
Am I correct in this specification?

• No. The period of the data should not change from year to year. Presumably you mean that the third year is a leap year. In that case, set the period to be 365.25 years.

• Herimanitra

Yes, leap year,
OK

• Herimanitra

I’m also curious to know how to restrict output of the forecast() function to the specified horizon’s length in order to speed up computation, it’s really slow at a certain level (~2000)
I’ve tried something like this but it didn’t work:

forecast(mymodel,h=100,method=”naive”)$fitted[2001:2100] • Herimanitra I figure out now that one needs to do: forecast(mymodel,h=100,method=”naive”)$mean
to obtain mean forecast in the desired horizons but the main problem I have is that with HoltWinters, it’s really slow compared to loess decomposition.

• Herimanitra

I’ve just seen someone setup ts() with daily frequency and use forecast() after:

What do you think about that?

• If you do that with auto.arima, it will return a non-seasonal model. If you try fitting a seasonal model with Arima or arima, it will return an error.

• Sonbol

Hello Dr. Hyndman,

I have a time series measuring solar radiance every minute. So I am assuming that I have a seasonality with length of 1440 (24*60). I am trying to use Fourier as xreg to consider the seasonality. I’ve considered a model like:

fit <-Arima(training, order=c(3,1,3), xreg=fourier(training,3)).
My aim is not forecasting. I am going to evaluate the model on a testing set. So I have something like

fit1 <- Arima(checking, model=fit, xreg=(?))
My problem is that I do not know how to choose xreg on the testing data. Is it right to consider xreg=fourier(checking,3)?

• Shubham

What does ‘n’ mean in the code? Is it the number of data points in the time series?

• yes

• Azza Ali

If I integrat time series with judgment what is right Equation ?

• Gamage

Dear Prof Hyndman,

I’m doing my final year undergraduate research on predicting tourist arrivals.
Monthly data is available from 1973 to 2013. But I supposed to use data from 2000 to 2013 in my analysis.
But the civil war of the country, which was lasting more than 30 years, was over in 2009. Hence after 2008 the behavior of tourist arrivals is rather different. (time series plot of monthly tourist arrivals from 1973-2013is attached)

Hence if I use data from 2009-2013 I have 60 monthly data points.
I surfed net & studied some articles for the minimum sample size required in time series analysis. In those articles they were saying that minimum of 50-60 data points is required.

Sir I would like to know whether it is capable to do time series analysis with these 60 monthly observations

Thank you.

• Gamage

Thank you very much sir

• Gamage

Sir according to your article “A good model will allow for changing trend and changing seasonal
patterns.”
Hence is it really needed to use data only from 2009-2013?
Isn’t it possible to use past data prior to 2009 sir?

• A.D

Dear Prof. Hyndman,

I have two large time series data. One is separated by seconds intervals and the other by minutes. The length of each time series is 180 days. I’m using R (3.1.1) for forecasting the data. I’d like to know the value of the “frequency” argument in the ts() function in R, for each data set.

Since most of the examples and cases I’ve seen so far are for months or days at the most, it is quite confusing for me when dealing with equally separated seconds or minutes. According to my understanding, the “frequency” argument is the number of observations per season. So what is the “season” in the case of seconds/minutes? My guess is that since there are 86,400 seconds and 1440 minutes a day, these should be the values for the “freq” argument. Is that correct?

I have posted a more constructed version of this question on cross validation: http://stats.stackexchange.com/questions/120806/frequency-value-for-seconds-minutes-intervals-data-in-r

I would very much appreciate you help!!

• A.D

• Akhil Dua

Hello Sir,

I am working with daily data on flight booking by fare class.

Data sample :

A B C D Departure date

10 20 30 5 15th Nov 2012

5 50 2 6 16th Nov 2012
. . . . .
. . . . .
5 50 2 6 16th Nov 2014
Here A, B, C and D are fare classes
I tried fitting arima model to all fare classes using auto.arima model for forecasting demand for the next day. I know that there is seasonal effect and weekend effect but auto.arima doesn’t pick up these effect.
Next I tried running auto.arima with dummies for month and weekend but still the model doesn’t fit well with data and the fitted values from the model are very different from the actual values.
Can you please suggest me which function of the package shall I use for modeling this kind of seasonality

• Indrek Sepp

Dear Prof. Hyndman,

Is there a danger that using fourier seasonal regressor(s) on time-series with non-seasonal trends and/or level shifts, that have a greater magnitude than the seasonal component of the TS, would result in the fourier series fitting these artifacts, and projecting them into the forecast?

• 黄耀鹏

Thank you very much for sharing! I’ve just met such kind of problem when using the arima model with a season priod of 288.

• Daumantas

In the equation for $y_t$, shouldn’t $alpha$ and $beta$ have subscripts $k$?

• Yes. Now fixed.

• it seems that there is a text missing between paragraphs:

“For such data I prefer a Fourier series approach where the seasonal
pattern is modelled using Fourier terms with short-term time series
dynamics allowed in the error. For example, consider the following
model:

It is much harder to do something like this for ETS models. One of our students has been working on this and our paper on complex seasonality describes the procedure. We plan to add this functionality to the forecast package soon, but the code is not quite ready yet.”

after words “consider the following model:”

• Now fixed. I edited this post recently and left a stray few sentences in.

• João Victor Zuanazzi Leme

Thanks a lot for this method!

Did you officially publish anything on this?
I was having a hard time trying to fit a proper SARIMA model on autocorrelated dailly observations with annual seasonal period and needed the residuals to fit a distribution (i.e., needed independent observations).
If you don’t mind, I would like to use it in my term paper and would like to know how would you like your reference to be.

Best regards, J.V.

• Not exactly. It’s a special case of dynamic regression models discussed in Section 9.1 of my forecasting textbook (http://www.otexts.org/fpp/9/1/).

• João Victor Z. Leme

I’m analyzing meteorological data and I’m interested in modelling bivariate distributions using Copula models. The problem is that these kind of variables are obviously autocorrelated (for example, daily measures of max. temperature or daily measures of precipitation). It’s not ok to fit a distribution to a variable of this kind because the assumption required for fitting a distribution is that the observations are independent within the sample, right?

A professor suggested working with the residuals from an ARIMA or a SARIMA model, but I don’t seem to understand what to do with the residuals after I fit them a distribution (generally non-Normal). Or a bivariate distribution: the residuals from different variables are clearly correlated, so I can (for ex.) simulate random bivariate observations using the Copula approach.
The correlations between the original variables and the correlation between their models residuals are often close (and their scatterplots look similar).

The question is: would I be able to generate observations from the original variables using only their simulated residuals? (taking in account that I know the order of their respective ARIMA or SARIMA models).

Thanks a lot!

• João Victor Z. Leme

Thinking of now, it surely would be possible to forecast an observation using the ARIMA model plus the generated residual (sorry for the silly question, didn’t think it straight).

But does it make sense to model the dependence between the residuals of different variables when these variables are serially correlated?

I’m just an undergradute student, not really an expert in time series.

• João Victor Z. Leme

Modelling the dependence of the residuals worked really well for simulated data (adding the generated residuals to the means of SARIMA forecasts). It maintained the dependence structure from the original variables (bivariate case).

• Ethan

Prof. Hyndman,
how does the forecast() function estimate the forecast\$mean values in this case? Is it like an extrapolation from the ARIMA model with added seasonality given by the Fourier series?

Regards, Ethan

• Yes.

what is the relationship between stationarity and periodic stationarity?
What is the main differences between seasonal ARIMA and PARMA models?

• Steev C.

hi sir

auto.arima assumes homoskedasticity. After running auto.arima and determining optimal orders of ARIMA and the number of harmonics, I have tried to run the model step by step.

The regression model with Fourier terms and dummies shows heteroskedasticity. Is the model still available? Should I necessarily correct for heteroskedasticity before estimate the ARIMA part? I’ve already use “log” on the dependant variable but there’s still heteroskedasticity

• datascientist4612

Prof. Hyndman,
I have data every 15 minutes for one year, with daily and weekly seasonality: freq is thus too high for auto.arima (there are 672 15min intervals in one week).
In the example given in the post, how was the ARIMA(2,0,1) model chosen? Was it chosen after looking at a lot of (p, d, q) models and choosing the model with the lowest AIC?
Or was d chosen by looking at the differenced time series and p, q via the acf and pacf? With this long seasonality, I find it difficult to interpret the acf and pacf.
Thank you!

• Minimum AICc with some constraints on not having models too close to the stationarity and invertibility boundaries.

• datascientist4612

Thank you very much for your reply. Are the stationarity and invertibility conditions implemented as an option in one of the functions of the forecast package, or do people put them by hand? In your FPP book, section 8.4 about invertibility, you say that “R will take care of these constraints”: so is this done automatically by R? Thanks a lot again!

• Yes. If you use auto.arima to fit your model, it will satisfy the stationarity and invertibility conditions. If you use Arima or arima to fit your model, it will give a warning if there is a problem with stationarity or invertibility.

• Abhishek Dhawan

I am trying to forecast for a data set which has per minute data for 4 days. So total 1440* 4 observations.
Every day data plot look very similar to each other, but when i made forecast using 4 days data with ARIMA , then forecast follows trend only of last some values. Is it because I have too many values for a day?

I want my 5th Dec observation to start from around 1800 as it is set as initial for every day, but forecasts instead follow trend of last observations of 4th Dec.
I used auto.arima()

Please advise. Will i need to use xreg if yes then what will be it

• So don’t use ARIMA. snaive would work better.

• Abhishek Dhawan

Prof. Hyndman,

As you suggested snaive worked really well as we can see in screenshot attached:

Thanks a lot for timely help on my critical project.

Also, following is accuracy for this model:
> accuracy(fit)
ME RMSE MAE MPE MAPE MASE ACF1
Training set -3.220139 18.53934 13.77477 -2.322245 3.840649 1 0.9975956

I have 2 more questions , kindly advise:

1. As seen above in accuracy, do you have any suggestion to make model more accurate using snaive or any other model. Following is data in .csv if it can be of any use:

2. I needed to use ts() object as snaive() didn’t work for my xts object.Time stamps in my data are as follows:
Time, Item Id, Quantity
01-12-2016 00:00, item, 1800
01-12-2016 00:01, item, 1799

i.e. in d-m-Y H:M format. As of now i used data in ts as follows:

myT<-ts(coredata(myTS_T),start=1,frequency=1440) #myTS_T is xts time series.
Due to this i am not able to see correct time stamps in plot and console. Please advise how to fix this if possible as data is per minute.

Thanks again !

Dr. Hyndman,

I have time series data that is quarter hourly, ie m = 96 for 30 days. I am using these 96 x 30 data points to train an ARIMA model and forecast for 1 day, ie. h=96 data points.

Thank you for your explanation in this blog post. Now I know that the flat line I am getting as a result of the forecast is not because something wrong with my code, rather a question of the large seasonal period that ARIMA cannot handle.

This is my code:

 library(xts) library(forecast)

 

train <- as.xts(read.zoo('my_proj/data/31_days_perc99_99.csv', sep = ',', header = TRUE, index = 2, tz='UTC')) fit <- auto.arima(train) fcast <- forecast(fit, h=96)) write.table(fcast, file = "my_proj/data/1_day_forecast.csv", sep = ',', row.names = FALSE) 

I would like to correct this problem and apply the Fourier approach as mentioned here. And for a start, I tried with K=4. Made the following change to the code:

 library(xts) library(forecast)

 

train <- as.xts(read.zoo('my_proj/data/31_days_perc99_99.csv', sep = ',', header = TRUE, index = 2, tz='UTC')) fit <- auto.arima(train, seasonal=FALSE, xreg=fourier(train, K=4)) fcast <- forecast(fit, h=96)) write.table(fcast, file = "my_proj/data/1_day_forecast.csv", sep = ',', row.names = FALSE) 

However I get the following error:

 K must be not be greater than period/2 

How should I solve this? I tried following your answers here at: http://stackoverflow.com/questions/37529876/strange-behavior-of-auto-arima-in-r-package-forecast but could not understand much, given my inexperience with time series data or models.

Also, the period is 96 here, so shouldn’t K=4 have worked in this case?

• Convert to ts objects and make sure your frequency attribute is set correctly.