A blog by Rob J Hyndman 

Twitter Gplus RSS

Probabilistic forecasting by Gneiting and Katzfuss (2014)

Published on 14 March 2014

The IJF is intro­duc­ing occa­sional review papers on areas of fore­cast­ing. We did a whole issue in 2006 review­ing 25 years of research since the Inter­na­tional Insti­tute of Fore­cast­ers was estab­lished. Since then, there has been a lot of new work in appli­ca­tion areas such as call cen­ter fore­cast­ing and elec­tric­ity price fore­cast­ing. In addi­tion, there are areas we did not cover in 2006 includ­ing new prod­uct fore­cast­ing and fore­cast­ing in finance. There have also been method­olog­i­cal and the­o­ret­i­cal devel­op­ments over the last eight years. Con­se­quently, I’ve started invit­ing emi­nent researchers to write sur­vey papers for the journal.

One obvi­ous choice was Tilmann Gneit­ing, who has pro­duced a large body of excel­lent work on prob­a­bilis­tic fore­cast­ing in the last few years. The the­ory of fore­cast­ing was badly in need of devel­op­ment, and Tilmann and his coau­thors have made sev­eral great con­tri­bu­tions in this area. How­ever, when I asked him to write a review he explained that another jour­nal had got in before me, and that the review was already writ­ten. It appeared in the very first vol­ume of the new jour­nal Annual Review of Sta­tis­tics and its Appli­ca­tion: Gneit­ing and Katz­fuss (2014) Prob­a­bilis­tic Fore­cast­ing, pp.125–151.

Hav­ing now read it, I’m both grate­ful for this more acces­si­ble intro­duc­tion to the area, and dis­ap­pointed that it didn’t end up in the Inter­na­tional Jour­nal of Fore­cast­ing. I fore­cast that it will be highly cited (although I won’t cal­cu­late a fore­cast dis­tri­b­u­tion or com­pute a scor­ing func­tion for that).

Also, good luck to the new jour­nal; it looks like it will be very use­ful, and is sure to have a high impact fac­tor given it pub­lishes review articles.

 
1 Comment  comments 

Testing for trend in ARIMA models

Published on 13 March 2014

Today’s email brought this one:

I was won­der­ing if I could get your opin­ion on a par­tic­u­lar prob­lem that I have run into dur­ing the review­ing process of an article.

Basi­cally, I have an analy­sis where I am look­ing at a cou­ple of time-​​series and I wanted to know if, over time there was an upward trend in the series. Inspec­tion of the raw data sug­gests there is, but we want some sta­tis­ti­cal evi­dence for this.

To achieve this I ran some ARIMA (0,1,1) mod­els includ­ing a drift/​trend term to see if the mean of the series did indeed shift upwards with time and found that it did. How­ever, we have run into an issue with a reviewer who argues that dif­fer­enc­ing removes trends and may not be a suit­able way to detect trends. There­fore, the fact that we found a trend despite dif­fer­enc­ing sug­gest that dif­fer­enc­ing was not suc­cess­ful. I know there are a few papers and text­books that use ARIMA (0,1,1) mod­els as ‘ran­dom walks with drift’-type mod­els so I cited them as exam­ples of this pro­ce­dure in action, but they remained unconvinced.

Instead it was sug­gested that I look for trends in the raw undif­fer­enced time-​​series as these would be more reli­able as no trends had been removed. AT the moment I am hes­i­tant to do this as I was sort of taught that even pure ran­dom walks could give you sig­nif­i­cant trends. More­over, given that the raw time-​​series is not sta­tion­ary I was wor­ried that an ARIMA (0,0,1) model as it would be might not actu­ally be appropriate.

There’s noth­ing like run­ning into igno­rant review­ers who want you to do things that make no sense. (more…)

 
10 Comments  comments 

Unit root tests and ARIMA models

Published on 12 March 2014

An email I received today:

I have a small prob­lem. I have a time series called x :

- If I use the default val­ues of auto.arima(x), the best model is an ARIMA(1,0,0)

- How­ever, I tried the func­tion ndiffs(x, test=“adf”) and ndiffs(x, test=“kpss”) as the KPSS test seems to be the default value, and the num­ber of dif­fer­ence is 0 for the kpss test (con­sis­tent with the results of auto.arima() ) but 2 for the ADF test.
I then tried auto.arima(x, test=“adf”) and now I have another model ARIMA(1,2,1). I am unsure which order of inte­gra­tion I should use as tests give fairly dif­fer­ent results.

Is there a test that prevails ?

(more…)

 
No Comments  comments 

Using old versions of R packages

Published on 10 March 2014

I received this email yesterday:

I have been using your ‘fore­cast’ pack­age for more than a year now. I was on R ver­sion 2.15 until last week, but I am hav­ing issues with lubri­date pack­age, hence decided to update R ver­sion to R 3.0.1. In our orga­ni­za­tion even get­ting an open source appli­ca­tion require us to go through a whole lot of approval processes. I asked for R 3.0.1, before I get approval for 3.0.1, a new ver­sion of R ( R 3.0.2 ) came out. Unfor­tu­nately for me fore­cast pack­age was built in R3.0.2. Is there any ver­sion of fore­cast pack­age that works in older ver­sion of R(3.0.1). I just don’t want to go through this entire approval war again within the orga­ni­za­tion.
Please help if you have any work around for this

This is unfor­tu­nately very com­mon. Many cor­po­rate IT envi­ron­ments lock down com­put­ers to such an extent that it crip­ples the use of mod­ern soft­ware like R which is con­tin­u­ously updated. It also affects uni­ver­si­ties (which should know bet­ter) and I am con­stantly try­ing to invent work-​​arounds to the con­straints that Monash IT ser­vices place on staff and stu­dent computers.

Here are a few thoughts that might help. (more…)

 
6 Comments  comments 

IJF news

Published on 7 March 2014

This is a short piece I wrote for the next issue of the Ora­cle newslet­ter pro­duced by the Inter­na­tional Insti­tute of Fore­cast­ers. (more…)

 
No Comments  comments 

Highlighting the web

Published on 6 March 2014

Users of my new online fore­cast­ing book have asked about hav­ing a facil­ity for per­sonal high­light­ing of selected sec­tions, as stu­dents often do with print books. We have plans to make this a built-​​in part of the plat­form, but for now it is pos­si­ble to do it using a sim­ple browser exten­sion. This approach allows any web­site to be high­lighted, so is even more use­ful than if we only had the facil­ity on OTexts​.org.

There are sev­eral pos­si­ble tools avail­able. One of the sim­plest tools that allows both high­light­ing and anno­ta­tions is Diigo. (more…)

 
No Comments  comments 

Forecasting weekly data

Published on 5 March 2014

This is another sit­u­a­tion where Fourier terms are use­ful for han­dling the sea­son­al­ity. Not only is the sea­sonal period rather long, it is non-​​integer (aver­ag­ing 365.25÷7 = 52.18). So ARIMA and ETS mod­els do not tend to give good results, even with a period of 52 as an approx­i­ma­tion.
(more…)

 
10 Comments  comments 

Fitting models to short time series

Published on 4 March 2014

Fol­low­ing my post on fit­ting mod­els to long time series, I thought I’d tackle the oppo­site prob­lem, which is more com­mon in busi­ness environments.

I often get asked how few data points can be used to fit a time series model. As with almost all sam­ple size ques­tions, there is no easy answer. It depends on the num­ber of model para­me­ters to be esti­mated and the amount of ran­dom­ness in the data. The sam­ple size required increases with the num­ber of para­me­ters to be esti­mated, and the amount of noise in the data. (more…)

 
1 Comment  comments 

Fitting models to long time series

Published on 1 March 2014

I received this email today:

I recall you made this very insight­ful remark some­where that, fit­ting a stan­dard arima model with too much data, ie. a very long time series, is a bad idea.

Can you elab­o­rate why?

I can see the issue with noise, which com­pounds the ML esti­ma­tion as the series gets too long. But is there any­thing else?

I’m not sure where I made a com­ment about this, but it is true that ARIMA mod­els don’t work well for very long time series. The same can be said about almost any other model too. The prob­lem is that real data do not come from the mod­els we use. When the num­ber of obser­va­tions is not large (say up to about 200) the mod­els often work well as an approx­i­ma­tion to what­ever process gen­er­ated the data. But even­tu­ally you will have enough data that the dif­fer­ence between the true process and the model starts to become more obvi­ous. An addi­tional prob­lem is that the opti­miza­tion of the para­me­ters becomes more time con­sum­ing because of the num­ber of obser­va­tions involved.

What to do about these issues depends on the pur­pose of the model. A more flex­i­ble non­para­met­ric model could be used, but this still assumes that the model struc­ture will work over the whole period of the data. A bet­ter approach is usu­ally to allow the model itself to change over time. For exam­ple, by using time-​​varying para­me­ters in a para­met­ric model, or by using a time-​​based ker­nel in a non­para­met­ric model. If you are only inter­ested in fore­cast­ing the next few obser­va­tions, it is equiv­a­lent and sim­pler to throw away the ear­li­est obser­va­tions and only fit a model to the most recent observations.

How many obser­va­tions to retain, or how fast to allow the time-​​varying para­me­ters to vary, can be tricky decisions.

 
3 Comments  comments 

More time series data online

Published on 27 February 2014

Ear­lier this week I had cof­fee with Ben Fulcher who told me about his online col­lec­tion com­pris­ing about 30,000 time series, mostly med­ical series such as ECG mea­sure­ments, mete­o­ro­log­i­cal series, bird­song, etc. There are some finance series, but not many other data from a busi­ness or eco­nomic con­text, although he does include my Time Series Data Library. In addi­tion, he pro­vides Mat­lab code to com­pute a large num­ber of char­ac­ter­is­tics. Any­one want­ing to test time series algo­rithms on a large col­lec­tion of data should take a look.

Unfor­tu­nately there is no R code, and no R inter­face for down­load­ing the data.

 
No Comments  comments