Seasonal periods

I get questions about this almost every week. Here is an example from a recent comment on this blog:

I have two large time series data. One is separated by seconds intervals and the other by minutes. The length of each time series is 180 days. I’m using R (3.1.1) for forecasting the data. I’d like to know the value of the “frequency” argument in the ts() function in R, for each data set. Since most of the examples and cases I’ve seen so far are for months or days at the most, it is quite confusing for me when dealing with equally separated seconds or minutes. According to my understanding, the “frequency” argument is the number of observations per season. So what is the “season” in the case of seconds/minutes? My guess is that since there are 86,400 seconds and 1440 minutes a day, these should be the values for the “freq” argument. Is that correct?

Yes, the “frequency” is the number of observations per “cycle” (normally a year, but sometimes a week, a day or an hour). This is the opposite of the definition of frequency in physics, or in Fourier analysis, where “period” is the length of the cycle, and “frequency” is the inverse of period. When using the ts() function in R, the following choices should be used.

Data frequency
Annual 1
Quarterly 4
Monthly 12
Weekly 52

Actually, there are not 52 weeks in a year, but 365.25/7 = 52.18 on average. But most functions which use ts objects require integer frequency.

Once the frequency of observations is smaller than a week, then there is usually more than one way of handling the frequency. For example, hourly data might have a daily seasonality (frequency=24), a weekly seasonality (frequency=24×7=168) and an annual seasonality (frequency=24×365.25=8766). If you want to use a ts object, then you need to decide which of these is the most important.

An alternative is to use a msts object (defined in the forecast package) which handles multiple seasonality time series. Then you can specify all the frequencies that might be relevant. It is also flexible enough to handle non-integer frequencies.

Data frequencies
minute hour day week year
Daily 7 365.25
Hourly 24 168 8766
Half-hourly 48 336 17532
Min­utes 60 1440 10080 525960
Sec­onds 60 3600 86400 604800 31557600

You won’t necessarily want to include all of these frequencies — just the ones that are likely to be present in the data. For example, any natural phenomena (e.g., sunshine hours) is unlikely to have a weekly period, and if your data are measured in one-minute intervals over a 3 month period, there is no point including an annual frequency.

For example, the taylor data set from the forecast package contains half-hourly electricity demand data from England and Wales over about 3 months in 2000. It was defined as

 taylor <- msts(x, seasonal.periods=c(48,336)

One convenient model for multiple seasonal time series is a TBATS model:

 taylor.fit <- tbats(taylor) plot(forecast(taylor.fit))

(Warning: this takes a few minutes.)

If an msts object is used with a function designed for ts objects, the largest seasonal period is used as the “frequency” attribute.

Related Posts:

• Deshani

Sir, I have daily electricity demand for five years and I created a msts object having seasonal.period=c(7,365.25). As the normal ‘decompose’ function has been designed for ts objects what can I do to decompose this multiple seasonality time series? Thank you in advance

• Use tbats()

• saurabh

I am not able to understand frequency.
I have last two years daywise data of stock price.I have to use that to make a time series using ts() and forecaste next 30 days stock price. what will be value of frequency argument in ts() in this case?
where is season and what is season?
Rgrds
Saurabh

• 7.

• germ

Hi Prof Rob,

you have chosen freq = 7 because it is weekly seasonality. How do you came to this conclusion? is it because stock prices have weekly seasonality?

• There are seven days in a week. The frequency attribute does not assume there is any seasonality present.

• germ

Thank you prof Rob.

Because you mentioned that there are more than one way and that we have to decide which is the most important. As such, when it comes to this, we can also have freq = 365.25 since there is 365.25 days in a year. Apart from msts, why freq = 7 but not freq = 365.25 in ts() ?

• Deshani

Sir, I have daily electricity demand for five years (2008/01/31 – 2012/12/31) and I created a multi seasonal time series object
using seasonal period c(7,365.25), but the created series have 2 dates in 2013. Will it be a problem for my further analysis? Don’t we have any other option to handle daily data which contain leap years?

• Leap years make it tricky. You can either omit the leap days and use seasonal period c(7,365), or leave them in and use c(7, 365.25). In the latter case, your dates won’t line up exactly, but it should work ok provided the annual seasonal pattern is smooth.

• Deshani

Thank you very much sir

• Thao Ta

sir, I am also working on Electricity data but my data is hourly from 1/1/2012 to 31/12/2014. I tried tbats() but R keep running for 11 hours without giving out the result. I read some previous comment about multiseasonal. Do we put c(24,168) for example in to the frequency of ts command when we define the time series?

• You need c(24, 7*24, 365*24). For a very long series, tbats() can be very slow. You might try alternative approaches including using Fourier terms for the seasonality with ARIMA errors.

• Thao Ta

Thank you very much sir. I will try it. Did you have any blog regarding to Fourier terms for Seasonality with ARIMA models? I just took a very basic time series class so the above concept is quite new for me.

• Use the search box.

• Matvey Bossis

Tried the above recipe on a few thousands of series of daily sales.

99.9% of the times it works fine 🙂

There are a few cases, in which the forecast “explodes”. I am attaching one of them. Tell me if you’d like more.

 fit <- tbats( msts( c( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 7, 6, 0, 6, 1, 12, 8, 2, 2, 0, 0, 0, 0, 2, 2, 2, 0, 0, 0, 0, 1, 10, 17, 17, 2, 0, 17, 17, 0, 3, 13, 11, 8, 3, 8, 14, 6, 3, 9, 3, 1, 2, 2, 10, 0, 0, 0, 12, 16, 16, 18, 23, 10, 12, 6, 0, 2, 4, 6, 0, 10, 1, 0, 7, 12, 11, 2, 14, 4, 8, 15, 5, 13, 9, 7, 14, 10, 22, 12, 9, 4, 13, 12, 13, 0, 3, 6, 9, 0, 0, 1, 11, 0, 6, 2, 6, 4, 12, 0, 9, 10, 8, 0, 1, 5, 13, 10, 2, 0, 0, 1, 3, 8, 18, 6, 5, 2, 4, 8, 4, 4, 3, 4, 9, 1, 3, 6, 2, 1, 0, 0, 10, 5, 3, 1, 5, 4, 1, 8, 3, 2, 1, 0, 4, 0, 0, 0, 0, 0, 0, 2, 0, 4, 3, 12, 12, 0, 5, 1, 9, 2, 9, 0, 0, 3, 0, 4, 3, 1, 1, 1, 1, 0, 1, 2, 3, 2, 7, 5, 3, 0, 0, 1, 3, 1, 0, 0, 0, 2, 1, 1, 0, 2, 1, 2, 2), seasonal.periods=c(7,365.25)) ) f <- forecast(fit) plot(f) 

The plot is attached. Also attaching another plot, where I got quite extreme osculations, by increasing the amount of leading zeros in the example above.

• I cannot replicate this problem with v6 of the package. I suspect it was fixed in the update.

• Matvey Bossis

It was v5.9. Sorry about that.
But upgrading didn’t fix all cases. Here is a case that reproduces on v6.1:

 fit <- tbats( msts( c( 11, 7, 85, 66, 50, 40, 39, 49, 55, 9, 3, 5, 6, 3, 6, 4, 3, 7, 2, 6, 4, 5, 4, 6, 4, 4, 3, 3, 4, 8, 8, 4, 3, 8, 5, 7, 3, 12, 9, 11, 10, 1, 9, 8, 13, 7, 6, 9, 10, 9, 11, 3, 8, 6, 6, 6, 16, 6, 2, 6, 7, 4, 6, 7, 3, 7, 10, 6, 0, 3, 7, 6, 4, 6, 2, 2, 11, 2, 10, 12, 6, 2, 6, 8, 7, 5, 7, 8, 2, 2, 6, 7, 6, 13, 9, 4, 7, 9, 6, 10, 6, 0, 5, 6, 5, 7, 9, 11, 5, 8, 0, 2, 8, 13, 12, 5, 9, 1, 3, 7, 4, 39, 36, 13, 19, 29, 21, 26, 3, 4, 4, 4, 0, 8, 7, 7, 4, 4, 3, 4, 7, 3, 6, 8, 2, 1, 4, 4, 7, 4, 0, 3, 5, 3, 4, 5, 44, 35, 29, 20, 35, 14, 35, 3, 5, 4, 5, 6, 4, 6, 4, 5, 3, 5, 3, 5, 6, 9, 1, 1, 3, 2, 7, 3, 6, 17, 27, 27, 20, 10, 23, 14, 5, 2, 5, 7, 7, 4, 7, 6, 1, 7, 5, 5, 5, 17, 23, 28, 22, 37, 46, 45, 5, 5, 1, 1, 2, 3, 3, 3, 1, 3, 2, 5, 1, 6, 13, 9, 12, 3, 8, 7, 4, 9, 7, 2, 7, 0, 6, 8, 34, 32, 25, 23, 37, 37, 19, 9, 3, 2, 3, 3, 1, 0, 2, 9, 6, 10, 3, 4, 9, 4, 5, 3, 4, 4, 2, 6, 6, 4, 4, 2, 7, 4, 5, 8, 1, 3, 4, 6, 3, 6, 4, 6, 9, 5, 12, 10, 8, 11, 8, 1, 3, 8, 4, 3, 7, 5, 4, 4, 7, 2, 5, 16, 30, 23, 16, 13, 7, 13, 6, 3, 6, 6, 7, 6, 7, 5, 9, 6, 8, 1, 3, 8, 3, 1, 4, 5, 4, 4, 4, 6, 6, 5, 6, 2, 6, 2, 4, 7, 0, 3, 1, 5, 4, 45, 35, 37, 21, 22, 14, 20, 4, 10, 4, 2, 2, 3, 5, 3, 2, 3, 6, 3, 1, 3, 6, 1, 5, 2, 1, 1, 3, 4, 3, 4, 2, 2, 3, 10, 73, 55, 25, 23, 32, 33, 41, 2, 3, 1, 7, 1, 2, 5, 5, 0, 0, 0, 4, 3, 5, 5, 3, 4, 2, 0, 3, 4, 1, 0, 3, 3, 8, 2, 5, 11, 5, 2, 2, 7, 2, 5, 6, 4, 3, 4, 1, 2, 1, 26, 25, 25, 22, 0, 12, 16, 11, 2, 3, 2, 1, 2, 1, 4, 3, 1, 6, 6, 1, 5, 4, 1, 1, 6, 3, 1, 3, 3, 1, 1, 1, 0, 3, 1, 21, 12, 16, 15), seasonal.periods=c(7,365.25)) ) f <- forecast(fit) plot(f) 

• Thanks. I have added it as a bug to be fixed at https://github.com/robjhyndman/forecast/issues/119

• Matvey Bossis

You are welcome 🙂
Also, an unrelated issue: I noticed, that feeding a cumsum(ts) into a forecast model, instead of the actual “ts”, many times improves the forecast accuracy. Were you aware of this? (the output can be brought back to scale using diff())
My motivation for trying this was the idea that I want a forecast for the SUM of sales, over the next 3 days, next 7 days, etc., and less interested in daily accuracy. But it seems, all accuracy, including daily, benefits from this trick.

• How are you measuring accuracy? The MSE values should be the same, I think.

• Matvey Bossis

Here is a demonstration of what I mean. I tried this trick with a few forecasting models, not just “holt”, on many time series, and most of the times it improved the accuracy. This is just one example, and maybe not the best one.

 training <- ts(c( 11, 7, 85, 66, 50, 40, 39, 49, 55, 9, 3, 5, 6, 3, 6, 4, 3, 7, 2, 6, 4, 5, 4, 6, 4, 4, 3, 3, 4, 8, 8, 4, 3, 8, 5, 7, 3, 12, 9, 11, 10, 1, 9, 8, 13, 7, 6, 9, 10, 9, 11, 3, 8, 6, 6, 6, 16, 6, 2, 6, 7, 4, 6, 7, 3, 7, 10, 6, 0, 3, 7, 6, 4, 6, 2, 2, 11, 2, 10, 12, 6, 2, 6, 8, 7, 5, 7, 8, 2, 2, 6, 7, 6, 13, 9, 4, 7, 9, 6, 10, 6, 0, 5, 6, 5, 7, 9, 11, 5, 8, 0, 2, 8, 13, 12, 5, 9, 1, 3, 7, 4, 39, 36, 13, 19, 29, 21, 26, 3, 4, 4, 4, 0, 8, 7, 7, 4, 4, 3, 4, 7, 3, 6, 8, 2, 1, 4, 4, 7, 4, 0, 3, 5, 3, 4, 5, 44, 35, 29, 20, 35, 14, 35, 3, 5, 4, 5, 6, 4, 6, 4, 5, 3, 5, 3, 5, 6, 9, 1, 1, 3, 2, 7, 3, 6, 17, 27, 27, 20, 10, 23, 14, 5, 2, 5, 7, 7, 4, 7, 6, 1, 7, 5, 5, 5, 17, 23, 28, 22, 37, 46, 45, 5, 5, 1, 1, 2, 3, 3, 3, 1, 3, 2, 5, 1, 6, 13, 9, 12, 3, 8, 7, 4, 9, 7, 2, 7, 0, 6, 8, 34, 32, 25, 23, 37, 37, 19, 9, 3, 2, 3, 3, 1, 0, 2, 9, 6, 10, 3, 4, 9, 4, 5, 3, 4, 4, 2, 6, 6, 4, 4, 2, 7, 4, 5, 8, 1, 3, 4, 6, 3, 6, 4, 6, 9, 5, 12, 10, 8, 11, 8, 1, 3, 8, 4, 3, 7, 5, 4, 4, 7, 2, 5, 16, 30, 23, 16, 13, 7, 13, 6, 3, 6, 6, 7, 6, 7, 5, 9, 6, 8, 1, 3, 8, 3, 1, 4, 5, 4, 4, 4, 6, 6, 5, 6, 2, 6, 2, 4, 7, 0, 3, 1, 5, 4, 45, 35, 37, 21, 22, 14, 20, 4, 10, 4, 2, 2, 3, 5, 3, 2, 3, 6, 3, 1, 3, 6, 1, 5, 2, 1, 1, 3, 4, 3, 4, 2, 2, 3, 10, 73, 55, 25, 23, 32, 33, 41, 2, 3, 1, 7, 1, 2, 5, 5, 0, 0, 0, 4, 3, 5, 5, 3, 4, 2, 0, 3, 4, 1, 0, 3, 3, 8, 2, 5, 11, 5, 2, 2, 7, 2, 5, 6, 4, 3, 4, 1, 2, 1, 26, 25, 25, 22, 0, 12, 16, 11, 2, 3, 2, 1, 2, 1, 4, 3, 1, 6, 6, 1, 5, 4, 1, 1, 6, 3, 1, 3, 3))

 cumsumTraingin <- cumsum(training) fit1 <- holt( (training) ) fit2 <- holt( (cumsumTraingin-tail(cumsumTraingin,1)) ) f1 <- forecast(fit1) f2 <- forecast(fit2) forecast1 <- as.numeric(f1$mean) forecast2 <- c(f2$mean[1], diff(f2$mean)) testing accuracy(forecast1, testing) ME RMSE MAE MPE MAPE ACF1 Theil's U Test set 4.36095 8.792137 6.352331 -Inf Inf 0.4722808 NaN > accuracy(forecast2, testing) ME RMSE MAE MPE MAPE ACF1 Theil's U Test set 4.244656 8.689479 6.299996 -Inf Inf 0.468064 NaN    • Your training data has a step change, and Holt on the cumulative data has a positive trend. So not too surprising. • Zack Sir, is the “season” or “unit of time” fixed at “Year”? Say ts(x, freq=7), does it mean 7 observation per year or 7 observations per week? How does R know the metric of “season” when there is only one argument “freq”? • No, it does not have to be a year. freq just specifies how many observations per “season” whatever that is. • Raziel Sir, I am quite new in R software. My problem is, I want to forecast electricity prices in a certain interval but I don’t know how to add dummy variables (due to seasonality) for forecasting in R codes. Could you help me? Thank you in advance. • Geoff Pofahl Hello Dr. Hyndman – apologies if my question has already been asked and answered…I did look and couldn’t find anything. How do you handle situations where you don’t necessarily know the right definition of ‘season’ from which to set your data frequency? I’m working with daily data for hundreds of variables. In some cases it appears that frequency = 7 is ‘correct’ but in other cases I seem to get better results with frequency = 21 or 28. Since I don’t have the luxury of looking at the data and manually deciding on a case by case basis I’m wondering if would be appropriate to run through numerous possibilities and then use resulting AIC values from each subsequent model to determine the best frequency setting? What do you think? Really appreciate any advice you can offer on this one. cheers, Geoff • It’s probably better to use all relevant seasonal periods and let tbats() decide how to use them. • Geoff Pofahl Thanks for the reply…I’ll give that a try. • Matthew Thanks for the post and the package hts. I am new to time series, so I have two questions: 1) If I am fitting daily data and there are peaks on Monday (week seasonality) , beginning of the month (month seasonality), and possibly certain months in the year, how should I specify that in msts() as there are different days in each month and averaging it would eliminate the month seasonality? 2) I am trying to understand the difference between hierarchical and grouped time series. e.g. If I have two income groups ( age > 50 and age < 50), and within first group I have income for age 1 to 50 and in second group I have income for age 51 +. This would be a hierarchical time series because level 1 (group) child nodes are link together. Is that right ? • 1. Monthly seasonality with daily data is tricky due to variable month lengths. You can’t specify that using seasonal periods. You could possibly handle it within a model using dummy variables. 2. Yes. See http://robjhyndman.com/papers/foresight-hts/ for a relatively accessible introduction to the topic. • Matthew To clarify, so I will have 50 time series for group 1 and 50 (assuming last time series is for 100+ age) for group 2. • Matthew Thank you Dr. Hyndman for your fast reply. I am wondering if each time series under a node has to be of same length? I suppose I can combine each series with 0s to account for the different length but I am not sure if it would impact the result. • Yes. All time series must be the same size. Only add 0s if it makes sense to treat the value at that time as a zero (e.g., sales for products that didn’t exist). • Matthew I have read many of your blogs and textbooks. Another question I have is: if I can fit a armax model with xreg containing dummy variables for seasonality (monthly, holiday …etc), does it mean I can use ts(data,frequency = 1) and specify everything in xreg? • It is better to leave the ts attributes as they are and specify seasonal=FALSE to prevent the ARIMA model trying to handle the seasonality. • Matthew Please make my recent question visible when you see this message as I can’t see it back again … thanks ! • Matthew I guess I will ask again, is there a way to use the forecast.gts function with a ranges for different lambdas ? I tried using it with a vector of lamdas but it produce an error. As I want to extract the residuals and fitted value for the forecast as well, is there a easier way to do it other than a for-loop feeding everything manually ? • If you use forecast.gts(), then lambda must be constant and the same for all series. Otherwise, you can generate the forecasts yourself and use combinef() to reconcile them. Residuals and fitted values can be obtained by setting keep.fitted=TRUE and keep.resid=TRUE in forecast.gts(). Read the help file. • Matthew Thanks ! I am aware that those arguments and the combinef() function. However, as I am fitting the model, my xreg matrix is outputting an error with Arima() for some of my columns. Please see if you can help : http://stats.stackexchange.com/questions/186164/armax-model-and-validation/187309#187309 • Deshani Sir, Can I decompose a multiple seasonal time series using tbats and get the random component of it? (Just as getting the random component using the decompose function for ts objects)? Please suggest a method to extract the random component of a msts. • Use residuals(fit) • Deshani Thank you very much sir for your fast reply • Adam Karolewski Sir, what about workdays in month or in a year. They used to differ like from 19 up to 22 workdays in a month. • Peter Lorenz Hi Rob, I have tried to get my head around to understand what the frequencies mean in a lag. My question is when I plot a graph (example a ACF) with frequency 24 (daily season) what does the lag 1 stand for in the x -axis. Does lag 1 describe 1 hour or one day? I hope this question is not too fuzzy but would help me a lot. Thank you for your support and the great explanations! • If you use acf(), 1 means 1 day. If you use Acf(), 1 means 1 hour. • Peter Lorenz Thank you for the reply! It helped me out a lot 🙂 • Chhavi Garg Sir, I am implementing TBATS on monthly data of past 4 years. I think there is monthly as well as semi-annual seasonality in my data. How should I give seasonal.periods argument to TBATS function? Should I give c(12,18) or c(1,6)? • If by “semi-annual” you mean that the period is 6 months, then you want c(6,12). • Chhavi Garg Many Thanks! This is helpful • Arun Akkinapalli Hello Sir. I have a time series dataset of 1 year with seasonal patterns that change every day in a week and every month of the year. So i am using tbats for multiple seasonal periods. bats_val <- bats(dailydata_smpl.ts, seasonal.periods = c(7,30)) Just want to check if the above way of declaring the seasonal periods will help me include both of them. Thanks in advance. • Not all months have 30 days, so that will not work. I would use c(7,365) for daily data except in the incredibly rare cases where there actually is a monthly pattern, in which case tbats will not handle it. • Arun Akkinapalli Thanks for the reply Sir. I will try it out. • Nabi Shaikh Arun i am also dealing with this frequency issue , my data starts from nov-2015 to sep-2016 with 10 minute data.and i am trying holt winter with this data Here is the code i am texting along plz do guide me on this dft <- ts(df$Temperature,start=c(2015,11),frequency = 365*24*60/10)
ins <- window(dft,start = c(2015,11),end=c(2016,4),frequency= 365*24*60/10)
ino <- window(dft,start = c(2016,5),end=c(2017,6),frequency= 365*24*60/10)

• Aram Tsatryan

How can I predict the date Taylor, but did not function holt winters, I want is more detailed forecast

• Leo Zhang

Hi prof,
Can I ask a question regarding seasonal effect in time series?
I know there’s procedures to conduct a seasonal decomposition of a time series into trend, seasonal and remainder components. Here the seasonal components are perfectly periodic patterns (or at least that’s what stl and decompose does in R).
On the other hand, the seasonal ARIMA model arima(p,d,q)(P,D,Q)[S] is another way to look at seasonality within a time series. Since seasonal ARIMA is driven by an error process, so the seasonal component is a stochastic process driven the random process through the AR(P)[S] and MA(Q)[S] processes (after differencing if needed). Therefore the two are not identical, am I right?
So if given a time series with obvious seasonal pattern, shall I use stl to remove the trend and seasonal, then model the remainder part, or shall I just model the original series with seasonal ARIMA models?
Thanks!

• STL can allow for changing seasonal patterns that are not perfectly periodic.

As to which approach to use, that depends on the time series in question. Try them both and use whatever works best.

• steev Codjia

sir, I have daily observations of withdrawals and the bank opens from monday to saturday. So, with the msts object, I’m thinking about seasonal.period=c(6,24, 288). I dont really understand frequency in msts. thank you in advance

• With daily observations you will have a seasonal period of 6. Nothing else.

• Steev Codjia

Thank you Sir. Why shouldn’t I take monthly or yearly
seasonality in account? I have data for five years.

• Sure. Yearly seasonality will have period approximately 6*52=312. I doubt that you have monthly seasonality as it is extremely rare, and does not have a fixed period in any case due to variable month lengths.

• Steev Codjia

Thank you sir for your answers. they really help me in my work. Please correct me if I’m wrong. to sum up, I should use seasonal.period=c(6) and bats to model my data.

• If you think you have annual seasonality, use seasonal.periods=c(6,312). And use tbats, not bats.

• Steev Codjia

Thank you prof. I have another question please. do TBATS threat non-working days as easter day? those days have no value and they are like gaps in the database.

Thank you for your post. I had a question regarding acf() while frequency is set to 365.25/7.
what does the lag 0.1, 0.2, … stand for in the x -axis?
Thanks.
Arsa

• It is measured in weeks. But you are better off using Acf() or ggAcf() instead.

• Nabi Shaikh

Sir @robjhyndman:disqus , i am also dealing with this frequency issue , my data starts from nov-2015 to sep-2016 with 10 minute data.and i am trying holt winter with this data Here is the code i am texting along plz do guide me on this dft <- ts(df\$Temperature,start=c(2015,11),frequency = 365*24*60/10)
ins <- window(dft,start = c(2015,11),end=c(2016,4),frequency= 365*24*60/10)
ino <- window(dft,start = c(2016,5),end=c(2017,6),frequency= 365*24*60/10), but here the can we replace the start=c(2015,11)instead of 11 can we start with actually starting time of data

• This is not a help site. Ask on stackoverflow.com

• Nkuli

Good Day Sir. Thank you for this website and your wonderful package forecast.