Visit of Di Cook

Next week, Pro­fes­sor Di Cook from Iowa State Uni­ver­sity is vis­it­ing my research group at Monash Uni­ver­sity. Di is a world leader in data visu­al­iza­tion, and is espe­cially well-​​known for her work on inter­ac­tive graph­ics and the XGobi and GGobi soft­ware. See her book with Deb Swayne for details.

For those want­ing to hear her speak, read on. Con­tinue reading →

Varian on big data

Last week my research group dis­cussed Hal Varian’s inter­est­ing new paper on “Big data: new tricks for econo­met­rics”, Jour­nal of Eco­nomic Per­spec­tives, 28(2): 3–28.

It’s a nice intro­duc­tion to trees, bag­ging and forests, plus a very brief entrée to the LASSO and the elas­tic net, and to slab and spike regres­sion. Not enough to be able to use them, but ok if you’ve no idea what they are. Con­tinue reading →

To explain or predict?

Last week, my research group dis­cussed Galit Shmueli’s paper “To explain or to pre­dict?”, Sta­tis­ti­cal Sci­ence, 25(3), 289–310. (See her web­site for fur­ther mate­ri­als.) This is a paper every­one doing sta­tis­tics and econo­met­rics should read as it helps to clar­ify a dis­tinc­tion that is often blurred. In the dis­cus­sion, the fol­low­ing issues were cov­ered amongst other things.

  1. The AIC is bet­ter suited to model selec­tion for pre­dic­tion as it is asymp­tot­i­cally equiv­a­lent to leave-​​one-​​out cross-​​validation in regres­sion, or one-​​step-​​cross-​​validation in time series. On the other hand, it might be argued that the BIC is bet­ter suited to model selec­tion for expla­na­tion, as it is consistent.
  2. P-​​values are asso­ci­ated with expla­na­tion, not pre­dic­tion. It makes lit­tle sense to use p-​​values to deter­mine the vari­ables in a model that is being used for pre­dic­tion. (There are prob­lems in using p-​​values for vari­able selec­tion in any con­text, but that is a dif­fer­ent issue.)
  3. Mul­ti­collinear­ity has a very dif­fer­ent impact if your goal is pre­dic­tion from when your goal is esti­ma­tion. When pre­dict­ing, mul­ti­collinear­ity is not really a prob­lem pro­vided the val­ues of your pre­dic­tors lie within the hyper-​​region of the pre­dic­tors used when esti­mat­ing the model.
  4. An ARIMA model has no explana­tory use, but is great at short-​​term prediction.
  5. How to han­dle miss­ing val­ues in regres­sion is dif­fer­ent in a pre­dic­tive con­text com­pared to an explana­tory con­text. For exam­ple, when build­ing an explana­tory model, we could just use all the data for which we have com­plete obser­va­tions (assum­ing there is no sys­tem­atic nature to the miss­ing­ness). But when pre­dict­ing, you need to be able to pre­dict using what­ever data you have. So you might have to build sev­eral mod­els, with dif­fer­ent num­bers of pre­dic­tors, to allow for dif­fer­ent vari­ables being missing.
  6. Many sta­tis­tics and econo­met­rics text­books fail to observe these dis­tinc­tions. In fact, a lot of sta­tis­ti­cians and econo­me­tri­cians are trained only in the expla­na­tion par­a­digm, with pre­dic­tion an after­thought. That is unfor­tu­nate as most applied work these days requires pre­dic­tive mod­el­ling, rather than explana­tory modelling.

 

 

Great papers to read

My research group meets every two weeks. It is always fun to talk about gen­eral research issues and new tools and tips we have dis­cov­ered. We also use some of the time to dis­cuss a paper that I choose for them. Today we dis­cussed Breiman’s clas­sic (2001) two cul­tures paper — some­thing every sta­tis­ti­cian should read, includ­ing the discussion.

I select papers that I want every mem­ber of research team to be famil­iar with. Usu­ally they are clas­sics in fore­cast­ing, or they are recent sur­vey papers.

In the last cou­ple of months we have also read the fol­low­ing papers:

Looking for a new post-​​doc

We are look­ing for a new post-​​doctoral research fel­low to work on the project “Macro­eco­nomic Fore­cast­ing in a Big Data World”.  Details are given at the link below

jobs​.monash​.edu​.au/​j​o​b​D​e​t​a​i​l​s​.​a​s​p​?​s​J​o​b​I​D​s​=​5​19824

This is a two year posi­tion, funded by the Aus­tralian Research Coun­cil, and work­ing with me, George Athana­sopou­los, Farshid Vahid and Anas­ta­sios Pana­giotelis. We are look­ing for some­one with a PhD in econo­met­rics, sta­tis­tics or machine learn­ing, who is well-​​trained in com­pu­ta­tion­ally inten­sive meth­ods, and who has a back­ground in at least one of time series analy­sis, macro­eco­nomic mod­el­ling, or Bayesian econometrics.

Blogs about research

If you find this blog help­ful (or even if you don’t but you’re inter­ested in blogs on research issues and tools), there are a few other blogs about doing research that you might find use­ful. Here are a few that I read.

I’ve cre­ated a bun­dle so you can sub­scribe to all of these in one go.

Of course, there are lots of sta­tis­tics blogs as well, and blogs about other research dis­ci­plines. The ones above are those that con­cen­trate on generic research issues.

CrossValidated Journal Club

Jour­nal Clubs are a great way to learn new research ideas and to keep up with the lit­er­a­ture. The idea is that a group of peo­ple get together every week or so to dis­cuss a paper of joint inter­est. This can hap­pen within your own research group or depart­ment, or vir­tu­ally online.

There is now a vir­tual jour­nal club oper­at­ing in con­junc­tion with Cross​Val​i​dated​.com. The first paper dis­cussed was on text data min­ing. It appears that the next paper may be on col­lab­o­ra­tive fil­ter­ing.

The empha­sis is on Open Access papers, prefer­ably with asso­ci­ated soft­ware that is freely avail­able. Some of the dis­cus­sion tends to cen­tre on how to imple­ment the ideas in R.

For those of us in Aus­tralia, the tim­ing is tricky. The first dis­cus­sion took place at 3am local time!

If you can’t make the Cross­Val­i­dated Jour­nal Club chats, why not start your own local club?

Research supervision workshop

Today I gave a work­shop for super­vi­sors of post­grad­u­ate stu­dents. Mostly I talked about cre­at­ing a team envi­ron­ment for post­grad­u­ate stu­dents rather than the tra­di­tional model (at least in sta­tis­tics and econo­met­rics) of each stu­dent work­ing in isolation.

The slides are avail­able here in pre­sen­ta­tion form or in hand­out form. Actu­ally, these are an edited ver­sion of the slides as I acci­den­tally left out a cou­ple of the pho­tographs in the work­shop, and I’ve omit­ted slides that I didn’t end up cov­er­ing in the workshop.

An impor­tant part of my research group is this blog. So if you haven’t been here before, please take a look around.

For those peo­ple who attended, feel free to add com­ments below if you would like to pro­vide feed­back on the workshop.