A blog by Rob J Hyndman 

Twitter Gplus RSS

Testing for trend in ARIMA models

Published on 13 March 2014

Today’s email brought this one:

I was won­der­ing if I could get your opin­ion on a par­tic­u­lar prob­lem that I have run into dur­ing the review­ing process of an article.

Basi­cally, I have an analy­sis where I am look­ing at a cou­ple of time-​​series and I wanted to know if, over time there was an upward trend in the series. Inspec­tion of the raw data sug­gests there is, but we want some sta­tis­ti­cal evi­dence for this.

To achieve this I ran some ARIMA (0,1,1) mod­els includ­ing a drift/​trend term to see if the mean of the series did indeed shift upwards with time and found that it did. How­ever, we have run into an issue with a reviewer who argues that dif­fer­enc­ing removes trends and may not be a suit­able way to detect trends. There­fore, the fact that we found a trend despite dif­fer­enc­ing sug­gest that dif­fer­enc­ing was not suc­cess­ful. I know there are a few papers and text­books that use ARIMA (0,1,1) mod­els as ‘ran­dom walks with drift’-type mod­els so I cited them as exam­ples of this pro­ce­dure in action, but they remained unconvinced.

Instead it was sug­gested that I look for trends in the raw undif­fer­enced time-​​series as these would be more reli­able as no trends had been removed. AT the moment I am hes­i­tant to do this as I was sort of taught that even pure ran­dom walks could give you sig­nif­i­cant trends. More­over, given that the raw time-​​series is not sta­tion­ary I was wor­ried that an ARIMA (0,0,1) model as it would be might not actu­ally be appropriate.

There’s noth­ing like run­ning into igno­rant review­ers who want you to do things that make no sense. (more…)

10 Comments  comments 

Unit root tests and ARIMA models

Published on 12 March 2014

An email I received today:

I have a small prob­lem. I have a time series called x :

- If I use the default val­ues of auto.arima(x), the best model is an ARIMA(1,0,0)

- How­ever, I tried the func­tion ndiffs(x, test=“adf”) and ndiffs(x, test=“kpss”) as the KPSS test seems to be the default value, and the num­ber of dif­fer­ence is 0 for the kpss test (con­sis­tent with the results of auto.arima() ) but 2 for the ADF test.
I then tried auto.arima(x, test=“adf”) and now I have another model ARIMA(1,2,1). I am unsure which order of inte­gra­tion I should use as tests give fairly dif­fer­ent results.

Is there a test that prevails ?


No Comments  comments 

Using old versions of R packages

Published on 10 March 2014

I received this email yesterday:

I have been using your ‘fore­cast’ pack­age for more than a year now. I was on R ver­sion 2.15 until last week, but I am hav­ing issues with lubri­date pack­age, hence decided to update R ver­sion to R 3.0.1. In our orga­ni­za­tion even get­ting an open source appli­ca­tion require us to go through a whole lot of approval processes. I asked for R 3.0.1, before I get approval for 3.0.1, a new ver­sion of R ( R 3.0.2 ) came out. Unfor­tu­nately for me fore­cast pack­age was built in R3.0.2. Is there any ver­sion of fore­cast pack­age that works in older ver­sion of R(3.0.1). I just don’t want to go through this entire approval war again within the orga­ni­za­tion.
Please help if you have any work around for this

This is unfor­tu­nately very com­mon. Many cor­po­rate IT envi­ron­ments lock down com­put­ers to such an extent that it crip­ples the use of mod­ern soft­ware like R which is con­tin­u­ously updated. It also affects uni­ver­si­ties (which should know bet­ter) and I am con­stantly try­ing to invent work-​​arounds to the con­straints that Monash IT ser­vices place on staff and stu­dent computers.

Here are a few thoughts that might help. (more…)

6 Comments  comments 

IJF news

Published on 7 March 2014

This is a short piece I wrote for the next issue of the Ora­cle newslet­ter pro­duced by the Inter­na­tional Insti­tute of Fore­cast­ers. (more…)

No Comments  comments 

Highlighting the web

Published on 6 March 2014

Users of my new online fore­cast­ing book have asked about hav­ing a facil­ity for per­sonal high­light­ing of selected sec­tions, as stu­dents often do with print books. We have plans to make this a built-​​in part of the plat­form, but for now it is pos­si­ble to do it using a sim­ple browser exten­sion. This approach allows any web­site to be high­lighted, so is even more use­ful than if we only had the facil­ity on OTexts​.org.

There are sev­eral pos­si­ble tools avail­able. One of the sim­plest tools that allows both high­light­ing and anno­ta­tions is Diigo. (more…)

No Comments  comments 

Forecasting weekly data

Published on 5 March 2014

This is another sit­u­a­tion where Fourier terms are use­ful for han­dling the sea­son­al­ity. Not only is the sea­sonal period rather long, it is non-​​integer (aver­ag­ing 365.25÷7 = 52.18). So ARIMA and ETS mod­els do not tend to give good results, even with a period of 52 as an approx­i­ma­tion.

10 Comments  comments 

Fitting models to short time series

Published on 4 March 2014

Fol­low­ing my post on fit­ting mod­els to long time series, I thought I’d tackle the oppo­site prob­lem, which is more com­mon in busi­ness environments.

I often get asked how few data points can be used to fit a time series model. As with almost all sam­ple size ques­tions, there is no easy answer. It depends on the num­ber of model para­me­ters to be esti­mated and the amount of ran­dom­ness in the data. The sam­ple size required increases with the num­ber of para­me­ters to be esti­mated, and the amount of noise in the data. (more…)

1 Comment  comments 

Fitting models to long time series

Published on 1 March 2014

I received this email today:

I recall you made this very insight­ful remark some­where that, fit­ting a stan­dard arima model with too much data, ie. a very long time series, is a bad idea.

Can you elab­o­rate why?

I can see the issue with noise, which com­pounds the ML esti­ma­tion as the series gets too long. But is there any­thing else?

I’m not sure where I made a com­ment about this, but it is true that ARIMA mod­els don’t work well for very long time series. The same can be said about almost any other model too. The prob­lem is that real data do not come from the mod­els we use. When the num­ber of obser­va­tions is not large (say up to about 200) the mod­els often work well as an approx­i­ma­tion to what­ever process gen­er­ated the data. But even­tu­ally you will have enough data that the dif­fer­ence between the true process and the model starts to become more obvi­ous. An addi­tional prob­lem is that the opti­miza­tion of the para­me­ters becomes more time con­sum­ing because of the num­ber of obser­va­tions involved.

What to do about these issues depends on the pur­pose of the model. A more flex­i­ble non­para­met­ric model could be used, but this still assumes that the model struc­ture will work over the whole period of the data. A bet­ter approach is usu­ally to allow the model itself to change over time. For exam­ple, by using time-​​varying para­me­ters in a para­met­ric model, or by using a time-​​based ker­nel in a non­para­met­ric model. If you are only inter­ested in fore­cast­ing the next few obser­va­tions, it is equiv­a­lent and sim­pler to throw away the ear­li­est obser­va­tions and only fit a model to the most recent observations.

How many obser­va­tions to retain, or how fast to allow the time-​​varying para­me­ters to vary, can be tricky decisions.

1 Comment  comments 

More time series data online

Published on 27 February 2014

Ear­lier this week I had cof­fee with Ben Fulcher who told me about his online col­lec­tion com­pris­ing about 30,000 time series, mostly med­ical series such as ECG mea­sure­ments, mete­o­ro­log­i­cal series, bird­song, etc. There are some finance series, but not many other data from a busi­ness or eco­nomic con­text, although he does include my Time Series Data Library. In addi­tion, he pro­vides Mat­lab code to com­pute a large num­ber of char­ac­ter­is­tics. Any­one want­ing to test time series algo­rithms on a large col­lec­tion of data should take a look.

Unfor­tu­nately there is no R code, and no R inter­face for down­load­ing the data.

No Comments  comments 

The forecast mean after back-​​transformation

Published on 25 February 2014

Many func­tions in the fore­cast pack­age for R will allow a Box-​​Cox trans­for­ma­tion. The mod­els are fit­ted to the trans­formed data and the fore­casts and pre­dic­tion inter­vals are back-​​transformed. This pre­serves the cov­er­age of the pre­dic­tion inter­vals, and the back-​​transformed point fore­cast can be con­sid­ered the median of the fore­cast den­si­ties (assum­ing the fore­cast den­si­ties on the trans­formed scale are sym­met­ric). For many pur­poses, this is accept­able, but occa­sion­ally the mean fore­cast is required. For exam­ple, with hier­ar­chi­cal fore­cast­ing the fore­casts need to be aggre­gated, and medi­ans do not aggre­gate but means do.

It is easy enough to derive the mean fore­cast using a Tay­lor series expan­sion. Sup­pose f(x) rep­re­sents the back-​​transformation func­tion, \mu is the mean on the trans­formed scale and \sigma^2 is the vari­ance on the trans­formed scale. Then using the first three terms of a Tay­lor expan­sion around \mu, the mean on the orig­i­nal scale is given by

    \[f(\mu) + \frac{1}{2}\sigma^2f''(\mu).\]


3 Comments  comments