Paperpile makes me more productive

One of the first things I tell my new research stu­dents is to use a ref­er­ence man­age­ment sys­tem to help them keep track of the papers they read, and to assist in cre­at­ing bib files for their bib­li­og­ra­phy. Most of them use Mende­ley, one or two use Zotero. Both do a good job and both are free.

I use nei­ther. I did use Mende­ley for sev­eral years (and blogged about it a few years ago), but it became slower and slower to sync as my ref­er­ence col­lec­tion grew. Even­tu­ally it sim­ply couldn’t han­dle the load. I have over 11,000 papers in my col­lec­tion of papers, and I was spend­ing sev­eral min­utes every day wait­ing for Mende­ley just to update the database.

Then I came across Paper­pile, which is not so well known as some of its com­peti­tors, but it is truly awe­some. I’ve now been using it for over a year, and I have grown to depend on it every day to keep track of all the papers I read, and to cre­ate my bib files. Con­tinue reading →

Di Cook is moving to Monash

I’m delighted that Pro­fes­sor Dianne Cook will be join­ing Monash Uni­ver­sity in July 2015 as a Pro­fes­sor of Busi­ness Ana­lyt­ics. Di is an Aus­tralian who has worked in the US for the past 25 years, mostly at Iowa State Uni­ver­sity. She is mov­ing back to Aus­tralia and join­ing the Depart­ment of Econo­met­rics and Busi­ness Sta­tis­tics in the Monash Busi­ness School, as part of our ini­tia­tive in Busi­ness Analytics.

Di is a world leader in data visu­al­iza­tion, and is well-​​​​known for her work on inter­ac­tive graph­ics. She is also the aca­d­e­mic super­vi­sor of sev­eral lead­ing data sci­en­tists includ­ing Hadley Wick­ham and Yihui Xie, both of whom work for RStu­dio.

Di has a great deal of energy and enthu­si­asm for com­pu­ta­tional sta­tis­tics and data visu­al­iza­tion, and will play a key role in devel­op­ing and teach­ing our new sub­jects in busi­ness analytics.

The Monash Busi­ness School is already excep­tion­ally strong in econo­met­rics (ranked 7th in the world on RePEc), and fore­cast­ing (ranked 11th on RePEc), and we have recently expanded into actu­ar­ial sci­ence. With Di join­ing the depart­ment, we will be extend­ing our exper­tise in the area of data visu­al­iza­tion as well.



Congratulations to Dr Souhaib Ben Taieb

Souhaib Ben Taieb has been awarded his doc­tor­ate at the Uni­ver­sité libre de Brux­elles and so he is now offi­cially Dr Ben Taieb! Although Souhaib lives in Brus­sels, and was a stu­dent at the Uni­ver­sité libre de Brux­elles, I co-​​supervised his doc­tor­ate (along with Pro­fes­sor Gian­luca Bon­tempi). Souhaib is the 19th PhD stu­dent of mine to graduate.

His the­sis was on “Machine learn­ing strate­gies for multi-​​step-​​ahead time series fore­cast­ing” and is now avail­able online. The prior research in this area has largely cen­tred around two strate­gies (recur­sive and direct), and which one works bet­ter in cer­tain cir­cum­stances. Recur­sive fore­cast­ing is the stan­dard approach where a model is designed to pre­dict one step ahead, and is then iter­ated to obtain multi-​​step-​​ahead fore­casts. Direct fore­cast­ing involves using a sep­a­rate fore­cast­ing model for each fore­cast hori­zon. Souhaib took a very dif­fer­ent per­spec­tive from the prior research and has devel­oped new strate­gies that are either hybrids of these two strate­gies, or com­pletely dif­fer­ent from either of them. The result­ing fore­casts are often sig­nif­i­cantly bet­ter than those obtained using the more tra­di­tional approaches.

Some of the papers to come out of Souhaib’s the­sis are already avail­able on his Google scholar page.

Well done Souhaib, and best wishes for the future.




Visit of Di Cook

Next week, Pro­fes­sor Di Cook from Iowa State Uni­ver­sity is vis­it­ing my research group at Monash Uni­ver­sity. Di is a world leader in data visu­al­iza­tion, and is espe­cially well-​​known for her work on inter­ac­tive graph­ics and the XGobi and GGobi soft­ware. See her book with Deb Swayne for details.

For those want­ing to hear her speak, read on. Con­tinue reading →

Varian on big data

Last week my research group dis­cussed Hal Varian’s inter­est­ing new paper on “Big data: new tricks for econo­met­rics”, Jour­nal of Eco­nomic Per­spec­tives, 28(2): 3–28.

It’s a nice intro­duc­tion to trees, bag­ging and forests, plus a very brief entrée to the LASSO and the elas­tic net, and to slab and spike regres­sion. Not enough to be able to use them, but ok if you’ve no idea what they are. Con­tinue reading →

To explain or predict?

Last week, my research group dis­cussed Galit Shmueli’s paper “To explain or to pre­dict?”, Sta­tis­ti­cal Sci­ence, 25(3), 289–310. (See her web­site for fur­ther mate­ri­als.) This is a paper every­one doing sta­tis­tics and econo­met­rics should read as it helps to clar­ify a dis­tinc­tion that is often blurred. In the dis­cus­sion, the fol­low­ing issues were cov­ered amongst other things.

  1. The AIC is bet­ter suited to model selec­tion for pre­dic­tion as it is asymp­tot­i­cally equiv­a­lent to leave-​​one-​​out cross-​​validation in regres­sion, or one-​​step-​​cross-​​validation in time series. On the other hand, it might be argued that the BIC is bet­ter suited to model selec­tion for expla­na­tion, as it is consistent.
  2. P-​​values are asso­ci­ated with expla­na­tion, not pre­dic­tion. It makes lit­tle sense to use p-​​values to deter­mine the vari­ables in a model that is being used for pre­dic­tion. (There are prob­lems in using p-​​values for vari­able selec­tion in any con­text, but that is a dif­fer­ent issue.)
  3. Mul­ti­collinear­ity has a very dif­fer­ent impact if your goal is pre­dic­tion from when your goal is esti­ma­tion. When pre­dict­ing, mul­ti­collinear­ity is not really a prob­lem pro­vided the val­ues of your pre­dic­tors lie within the hyper-​​region of the pre­dic­tors used when esti­mat­ing the model.
  4. An ARIMA model has no explana­tory use, but is great at short-​​term prediction.
  5. How to han­dle miss­ing val­ues in regres­sion is dif­fer­ent in a pre­dic­tive con­text com­pared to an explana­tory con­text. For exam­ple, when build­ing an explana­tory model, we could just use all the data for which we have com­plete obser­va­tions (assum­ing there is no sys­tem­atic nature to the miss­ing­ness). But when pre­dict­ing, you need to be able to pre­dict using what­ever data you have. So you might have to build sev­eral mod­els, with dif­fer­ent num­bers of pre­dic­tors, to allow for dif­fer­ent vari­ables being missing.
  6. Many sta­tis­tics and econo­met­rics text­books fail to observe these dis­tinc­tions. In fact, a lot of sta­tis­ti­cians and econo­me­tri­cians are trained only in the expla­na­tion par­a­digm, with pre­dic­tion an after­thought. That is unfor­tu­nate as most applied work these days requires pre­dic­tive mod­el­ling, rather than explana­tory modelling.



Great papers to read

My research group meets every two weeks. It is always fun to talk about gen­eral research issues and new tools and tips we have dis­cov­ered. We also use some of the time to dis­cuss a paper that I choose for them. Today we dis­cussed Breiman’s clas­sic (2001) two cul­tures paper — some­thing every sta­tis­ti­cian should read, includ­ing the discussion.

I select papers that I want every mem­ber of research team to be famil­iar with. Usu­ally they are clas­sics in fore­cast­ing, or they are recent sur­vey papers.

In the last cou­ple of months we have also read the fol­low­ing papers:

Looking for a new post-​​doc

We are look­ing for a new post-​​doctoral research fel­low to work on the project “Macro­eco­nomic Fore­cast­ing in a Big Data World”.  Details are given at the link below


This is a two year posi­tion, funded by the Aus­tralian Research Coun­cil, and work­ing with me, George Athana­sopou­los, Farshid Vahid and Anas­ta­sios Pana­giotelis. We are look­ing for some­one with a PhD in econo­met­rics, sta­tis­tics or machine learn­ing, who is well-​​trained in com­pu­ta­tion­ally inten­sive meth­ods, and who has a back­ground in at least one of time series analy­sis, macro­eco­nomic mod­el­ling, or Bayesian econometrics.

Blogs about research

If you find this blog help­ful (or even if you don’t but you’re inter­ested in blogs on research issues and tools), there are a few other blogs about doing research that you might find use­ful. Here are a few that I read.

I’ve cre­ated a bun­dle so you can sub­scribe to all of these in one go.

Of course, there are lots of sta­tis­tics blogs as well, and blogs about other research dis­ci­plines. The ones above are those that con­cen­trate on generic research issues.

CrossValidated Journal Club

Jour­nal Clubs are a great way to learn new research ideas and to keep up with the lit­er­a­ture. The idea is that a group of peo­ple get together every week or so to dis­cuss a paper of joint inter­est. This can hap­pen within your own research group or depart­ment, or vir­tu­ally online.

There is now a vir­tual jour­nal club oper­at­ing in con­junc­tion with Cross​Val​i​dated​.com. The first paper dis­cussed was on text data min­ing. It appears that the next paper may be on col­lab­o­ra­tive fil­ter­ing.

The empha­sis is on Open Access papers, prefer­ably with asso­ci­ated soft­ware that is freely avail­able. Some of the dis­cus­sion tends to cen­tre on how to imple­ment the ideas in R.

For those of us in Aus­tralia, the tim­ing is tricky. The first dis­cus­sion took place at 3am local time!

If you can’t make the Cross­Val­i­dated Jour­nal Club chats, why not start your own local club?