A blog by Rob J Hyndman 

Twitter Gplus RSS

Varian on big data

Published on 16 June 2014

Last week my research group dis­cussed Hal Varian’s inter­est­ing new paper on “Big data: new tricks for econo­met­rics”, Jour­nal of Eco­nomic Per­spec­tives, 28(2): 3–28.

It’s a nice intro­duc­tion to trees, bag­ging and forests, plus a very brief entrée to the LASSO and the elas­tic net, and to slab and spike regres­sion. Not enough to be able to use them, but ok if you’ve no idea what they are. (more…)

 
No Comments  comments 

Specifying complicated groups of time series in hts

Published on 15 June 2014

With the lat­est ver­sion of the hts pack­age for R, it is now pos­si­ble to spec­ify rather com­pli­cated group­ing struc­tures rel­a­tively easily.

All aggre­ga­tion struc­tures can be rep­re­sented as hier­ar­chies or as cross-​​products of hier­ar­chies. For exam­ple, a hier­ar­chi­cal time series may be based on geog­ra­phy: coun­try, state, region, store. Often there is also a sep­a­rate prod­uct hier­ar­chy: prod­uct groups, prod­uct types, packet size. Fore­casts of all the dif­fer­ent types of aggre­ga­tion are required; e.g., prod­uct type A within region X. The aggre­ga­tion struc­ture is a cross-​​product of the two hierarchies.

This frame­work includes even appar­ently non-​​hierarchical data: con­sider the sim­ple case of a time series of deaths split by sex and state. We can con­sider sex and state as two very sim­ple hier­ar­chies with only one level each. Then we wish to fore­cast the aggre­gates of all com­bi­na­tions of the two hierarchies.

Any num­ber of sep­a­rate hier­ar­chies can be com­bined in this way. Non-​​hierarchical fac­tors such as sex can be treated as single-​​level hier­ar­chies. (more…)

 
No Comments  comments 

European talks. June-​​July 2014

Published on 14 June 2014

For the next month I am trav­el­ling in Europe and will be giv­ing the fol­low­ing talks.

17 June. Chal­lenges in fore­cast­ing peak elec­tric­ity demand. Energy Forum, Sierre, Valais/​Wallis, Switzerland.

20 June. Com­mon func­tional prin­ci­pal com­po­nent mod­els for mor­tal­ity fore­cast­ing. Inter­na­tional Work­shop on Func­tional and Oper­a­to­r­ial Sta­tis­tics. Stresa, Italy.

24–25 June. Func­tional time series with appli­ca­tions in demog­ra­phy. Hum­boldt Uni­ver­sity, Berlin.

1 July. Fast com­pu­ta­tion of rec­on­ciled fore­casts in hier­ar­chi­cal and grouped time series. Inter­na­tional Sym­po­sium on Fore­cast­ing, Rot­ter­dam, Netherlands.

 
No Comments  comments 

Creating a handout from beamer slides

Published on 11 June 2014

I’m about to head off on a speak­ing tour to Europe (more on that in another post) and one of my hosts has asked for my pow­er­point slides so they can print them. They have made two false assump­tions: (1) that I use pow­er­point; (2) that my slides are sta­tic so they can be printed.

Instead, I pro­duced a cut-​​down ver­sion of my beamer slides, leav­ing out some of the ani­ma­tions and other fea­tures that will not print eas­ily. Then I pro­duced a pdf file with sev­eral slides per page. (more…)

 
1 Comment  comments 

Data science market places

Published on 26 May 2014

Some new web­sites are being estab­lished offer­ing “mar­ket places” for data sci­ence. Two I’ve come across recently are Experfy and Sna­p­An­a­lytx. (more…)

 
No Comments  comments 

Structural breaks

Published on 23 May 2014

I’m tired of read­ing about tests for struc­tural breaks and here’s why.

A struc­tural break occurs when we see a sud­den change in a time series or a rela­tion­ship between two time series. Econo­me­tri­cians love papers on struc­tural breaks, and appar­ently believe in them. Per­son­ally, I tend to take a dif­fer­ent view of the world. I think a more real­is­tic view is that most things change slowly over time, and only occa­sion­ally with sud­den dis­con­tin­u­ous change. (more…)

 
5 Comments  comments 

To explain or predict?

Published on 19 May 2014

Last week, my research group dis­cussed Galit Shmueli’s paper “To explain or to pre­dict?”, Sta­tis­ti­cal Sci­ence, 25(3), 289–310. (See her web­site for fur­ther mate­ri­als.) This is a paper every­one doing sta­tis­tics and econo­met­rics should read as it helps to clar­ify a dis­tinc­tion that is often blurred. In the dis­cus­sion, the fol­low­ing issues were cov­ered amongst other things.

  1. The AIC is bet­ter suited to model selec­tion for pre­dic­tion as it is asymp­tot­i­cally equiv­a­lent to leave-​​one-​​out cross-​​validation in regres­sion, or one-​​step-​​cross-​​validation in time series. On the other hand, it might be argued that the BIC is bet­ter suited to model selec­tion for expla­na­tion, as it is consistent.
  2. P-​​values are asso­ci­ated with expla­na­tion, not pre­dic­tion. It makes lit­tle sense to use p-​​values to deter­mine the vari­ables in a model that is being used for pre­dic­tion. (There are prob­lems in using p-​​values for vari­able selec­tion in any con­text, but that is a dif­fer­ent issue.)
  3. Mul­ti­collinear­ity has a very dif­fer­ent impact if your goal is pre­dic­tion from when your goal is esti­ma­tion. When pre­dict­ing, mul­ti­collinear­ity is not really a prob­lem pro­vided the val­ues of your pre­dic­tors lie within the hyper-​​region of the pre­dic­tors used when esti­mat­ing the model.
  4. An ARIMA model has no explana­tory use, but is great at short-​​term prediction.
  5. How to han­dle miss­ing val­ues in regres­sion is dif­fer­ent in a pre­dic­tive con­text com­pared to an explana­tory con­text. For exam­ple, when build­ing an explana­tory model, we could just use all the data for which we have com­plete obser­va­tions (assum­ing there is no sys­tem­atic nature to the miss­ing­ness). But when pre­dict­ing, you need to be able to pre­dict using what­ever data you have. So you might have to build sev­eral mod­els, with dif­fer­ent num­bers of pre­dic­tors, to allow for dif­fer­ent vari­ables being missing.
  6. Many sta­tis­tics and econo­met­rics text­books fail to observe these dis­tinc­tions. In fact, a lot of sta­tis­ti­cians and econo­me­tri­cians are trained only in the expla­na­tion par­a­digm, with pre­dic­tion an after­thought. That is unfor­tu­nate as most applied work these days requires pre­dic­tive mod­el­ling, rather than explana­tory modelling.

 

 

 
4 Comments  comments 

Questions on the business analytics jobs

Published on 13 May 2014

I’ve received a few ques­tions on the busi­ness ana­lyt­ics jobs adver­tised last week. I think it is best if I answer them here so other poten­tial can­di­dates can have the same infor­ma­tion. I will add to this post if I receive more ques­tions. (more…)

 
No Comments  comments 

ARIMA models with long lags

Published on 8 May 2014

Today’s email question:

I work within a gov­ern­ment bud­get office and some­times have to fore­cast fairly sim­ple time series sev­eral quar­ters into the future. Auto.arima() works great and I often get some­thing along the lines of: ARIMA(0,0,1)(1,1,0)[12] with drift as the low­est AICc.

How­ever, my boss (who does not use R) takes issue with low-​​order AR and MA because “you’re essen­tially using fore­casted data to make your fore­cast.” His mod­els include AR(10) MA(12)s etc. rather fre­quently. I argue that’s over­fit­ting. I don’t see a great deal of dis­cus­sion in text­books about this, and I’ve never seen such higher-​​order mod­els in a text­book set­ting. But are they fairly com­mon in prac­tice? What con­cerns could I raise with him about higher-​​order mod­els? Any advice you could give would be appreciated.

(more…)

 
3 Comments  comments 

New jobs in business analytics at Monash

Published on 4 May 2014

We have an excit­ing new ini­tia­tive at Monash Uni­ver­sity with some new posi­tions in busi­ness ana­lyt­ics. This is part of a plan to strengthen our research and teach­ing in the data science/​computational sta­tis­tics area. We are hop­ing to make mul­ti­ple appoint­ments, at junior and senior lev­els. These are five-​​year appoint­ments, but we hope that the posi­tions will con­tinue after that if we can secure suit­able fund­ing. (more…)

 
2 Comments  comments