A blog by Rob J Hyndman 

Twitter Gplus RSS

Data science market places

Published on 26 May 2014

Some new web­sites are being estab­lished offer­ing “mar­ket places” for data sci­ence. Two I’ve come across recently are Experfy and Sna­p­An­a­lytx. (more…)

No Comments  comments 

Structural breaks

Published on 23 May 2014

I’m tired of read­ing about tests for struc­tural breaks and here’s why.

A struc­tural break occurs when we see a sud­den change in a time series or a rela­tion­ship between two time series. Econo­me­tri­cians love papers on struc­tural breaks, and appar­ently believe in them. Per­son­ally, I tend to take a dif­fer­ent view of the world. I think a more real­is­tic view is that most things change slowly over time, and only occa­sion­ally with sud­den dis­con­tin­u­ous change. (more…)

5 Comments  comments 

To explain or predict?

Published on 19 May 2014

Last week, my research group dis­cussed Galit Shmueli’s paper “To explain or to pre­dict?”, Sta­tis­ti­cal Sci­ence, 25(3), 289–310. (See her web­site for fur­ther mate­ri­als.) This is a paper every­one doing sta­tis­tics and econo­met­rics should read as it helps to clar­ify a dis­tinc­tion that is often blurred. In the dis­cus­sion, the fol­low­ing issues were cov­ered amongst other things.

  1. The AIC is bet­ter suited to model selec­tion for pre­dic­tion as it is asymp­tot­i­cally equiv­a­lent to leave-​​one-​​out cross-​​validation in regres­sion, or one-​​step-​​cross-​​validation in time series. On the other hand, it might be argued that the BIC is bet­ter suited to model selec­tion for expla­na­tion, as it is consistent.
  2. P-​​values are asso­ci­ated with expla­na­tion, not pre­dic­tion. It makes lit­tle sense to use p-​​values to deter­mine the vari­ables in a model that is being used for pre­dic­tion. (There are prob­lems in using p-​​values for vari­able selec­tion in any con­text, but that is a dif­fer­ent issue.)
  3. Mul­ti­collinear­ity has a very dif­fer­ent impact if your goal is pre­dic­tion from when your goal is esti­ma­tion. When pre­dict­ing, mul­ti­collinear­ity is not really a prob­lem pro­vided the val­ues of your pre­dic­tors lie within the hyper-​​region of the pre­dic­tors used when esti­mat­ing the model.
  4. An ARIMA model has no explana­tory use, but is great at short-​​term prediction.
  5. How to han­dle miss­ing val­ues in regres­sion is dif­fer­ent in a pre­dic­tive con­text com­pared to an explana­tory con­text. For exam­ple, when build­ing an explana­tory model, we could just use all the data for which we have com­plete obser­va­tions (assum­ing there is no sys­tem­atic nature to the miss­ing­ness). But when pre­dict­ing, you need to be able to pre­dict using what­ever data you have. So you might have to build sev­eral mod­els, with dif­fer­ent num­bers of pre­dic­tors, to allow for dif­fer­ent vari­ables being missing.
  6. Many sta­tis­tics and econo­met­rics text­books fail to observe these dis­tinc­tions. In fact, a lot of sta­tis­ti­cians and econo­me­tri­cians are trained only in the expla­na­tion par­a­digm, with pre­dic­tion an after­thought. That is unfor­tu­nate as most applied work these days requires pre­dic­tive mod­el­ling, rather than explana­tory modelling.



4 Comments  comments 

Questions on the business analytics jobs

Published on 13 May 2014

I’ve received a few ques­tions on the busi­ness ana­lyt­ics jobs adver­tised last week. I think it is best if I answer them here so other poten­tial can­di­dates can have the same infor­ma­tion. I will add to this post if I receive more ques­tions. (more…)

No Comments  comments 

ARIMA models with long lags

Published on 8 May 2014

Today’s email question:

I work within a gov­ern­ment bud­get office and some­times have to fore­cast fairly sim­ple time series sev­eral quar­ters into the future. Auto.arima() works great and I often get some­thing along the lines of: ARIMA(0,0,1)(1,1,0)[12] with drift as the low­est AICc.

How­ever, my boss (who does not use R) takes issue with low-​​order AR and MA because “you’re essen­tially using fore­casted data to make your fore­cast.” His mod­els include AR(10) MA(12)s etc. rather fre­quently. I argue that’s over­fit­ting. I don’t see a great deal of dis­cus­sion in text­books about this, and I’ve never seen such higher-​​order mod­els in a text­book set­ting. But are they fairly com­mon in prac­tice? What con­cerns could I raise with him about higher-​​order mod­els? Any advice you could give would be appreciated.


3 Comments  comments 

New jobs in business analytics at Monash

Published on 4 May 2014

We have an excit­ing new ini­tia­tive at Monash Uni­ver­sity with some new posi­tions in busi­ness ana­lyt­ics. This is part of a plan to strengthen our research and teach­ing in the data science/​computational sta­tis­tics area. We are hop­ing to make mul­ti­ple appoint­ments, at junior and senior lev­els. These are five-​​year appoint­ments, but we hope that the posi­tions will con­tinue after that if we can secure suit­able fund­ing. (more…)

2 Comments  comments 

Great papers to read

Published on 2 May 2014

My research group meets every two weeks. It is always fun to talk about gen­eral research issues and new tools and tips we have dis­cov­ered. We also use some of the time to dis­cuss a paper that I choose for them. Today we dis­cussed Breiman’s clas­sic (2001) two cul­tures paper — some­thing every sta­tis­ti­cian should read, includ­ing the discussion.

I select papers that I want every mem­ber of research team to be famil­iar with. Usu­ally they are clas­sics in fore­cast­ing, or they are recent sur­vey papers.

In the last cou­ple of months we have also read the fol­low­ing papers:

2 Comments  comments 

Past, present, and future of statistical science

Published on 28 April 2014

This is the title of a won­der­ful new book that has just been released, cour­tesy of the Com­mit­tee of Pres­i­dents of Sta­tis­ti­cal Societies.

It can be freely down­loaded from the COPSS web­site or a hard copy can be pur­chased on Ama­zon (for only a lit­tle over 10c per page which is not bad com­pared to other sta­tis­tics books).

The book con­sists of 52 chap­ters span­ning 622 pages. The full table of con­tents below shows its scope and the list of authors (a ver­i­ta­ble who’s who in sta­tis­tics). (more…)

1 Comment  comments 

Publishing an R package in the Journal of Statistical Software

Published on 24 April 2014

I’ve been an edi­tor of JSS for the last few years, and as a result I tend to get email from peo­ple ask­ing me about pub­lish­ing papers describ­ing R pack­ages in JSS. So for all those won­der­ing, here are some gen­eral com­ments. (more…)

4 Comments  comments 

Seven forecasting blogs

Published on 22 April 2014

There are sev­eral other blogs on fore­cast­ing that read­ers might be inter­ested in. Here are seven worth following:

  1. No Hes­i­ta­tions by Fran­cis Diebold (Pro­fes­sor of Eco­nom­ics, Uni­ver­sity of Penn­syl­va­nia). Diebold needs no intro­duc­tion to fore­cast­ers. He pri­mar­ily cov­ers fore­cast­ing in eco­nom­ics and finance, but also xkcd car­toons, graph­ics, research issues, etc.
  2. Econo­met­rics Beat by Dave Giles. Dave is a pro­fes­sor of eco­nom­ics at the Uni­ver­sity of Vic­to­ria (Canada), for­merly from my own depart­ment at Monash Uni­ver­sity (Aus­tralia), and a native New Zealan­der. Not a lot on fore­cast­ing, but plenty of inter­est­ing posts about econo­met­rics and sta­tis­tics more generally.
  3. Busi­ness fore­cast­ing by Clive Jones (a pro­fes­sional fore­caster based in Col­orado, USA). Orig­i­nally about sales and new prod­uct fore­cast­ing, but he now cov­ers a lot of other fore­cast­ing top­ics and has an inter­est­ing prac­ti­tioner perspective.
  4. Freakono­met­rics: by Arthur Char­p­en­tier (an actu­ary and pro­fes­sor of math­e­mat­ics at the Uni­ver­sity of Que­bec at Montreal, Canada). This is the most pro­lific blog on this list. Wide rang­ing and tak­ing in sta­tis­tics, fore­cast­ing, econo­met­rics, actu­ar­ial sci­ence, R, and any­thing else that takes his fancy. Some­times in French.
  5. No free hunch: the kag­gle blog. Some of the most inter­est­ing posts are from kag­gle com­pe­ti­tion win­ners explain­ing their methods.
  6. Energy fore­cast­ing by Tao Hong (for­merly an energy fore­caster for SAS, now a pro­fes­sor at UNC). He cov­ers mostly energy fore­cast­ing issues and job postings.
  7. The offi­cial IIF blog. Con­fer­ences, jobs, mem­ber pro­files, etc.
No Comments  comments