Paperpile makes me more productive

One of the first things I tell my new research stu­dents is to use a ref­er­ence man­age­ment sys­tem to help them keep track of the papers they read, and to assist in cre­at­ing bib files for their bib­li­og­ra­phy. Most of them use Mende­ley, one or two use Zotero. Both do a good job and both are free.

I use nei­ther. I did use Mende­ley for sev­eral years (and blogged about it a few years ago), but it became slower and slower to sync as my ref­er­ence col­lec­tion grew. Even­tu­ally it sim­ply couldn’t han­dle the load. I have over 11,000 papers in my col­lec­tion of papers, and I was spend­ing sev­eral min­utes every day wait­ing for Mende­ley just to update the database.

Then I came across Paper­pile, which is not so well known as some of its com­peti­tors, but it is truly awe­some. I’ve now been using it for over a year, and I have grown to depend on it every day to keep track of all the papers I read, and to cre­ate my bib files. Con­tinue reading →

What to cite?

This ques­tion comes from a com­ment on another post:

I’ve seen authors cit­ing as many ref­er­ences as pos­si­ble to try to please poten­tial ref­er­ees. Many of those ref­er­ences are low qual­ity papers though. Any gen­eral guid­ance about a typ­i­cal length for the ref­er­ence section?

It depends on the sub­ject and style of the paper. I’ve writ­ten a paper with over 900 cita­tions, but that was a review of time series fore­cast­ing over a 25 year period, and so it had to include a lot of references.

I’ve also writ­ten a paper with just four cita­tions. As it was a com­men­tary, it did not need a lot of con­tex­tual information.

Rather than pro­vide guid­ance on the length of the ref­er­ence sec­tion, I think it is bet­ter to fol­low some gen­eral prin­ci­ples of cita­tion in research. Con­tinue reading →

Nominations for best International Journal of Forecasting paper, 2012–2013

Every two years, the Inter­na­tional Jour­nal of Fore­cast­ing awards a prize for the best paper pub­lished in a two year period. It is now time to iden­tify the best paper pub­lished in the IJF dur­ing 2012 and 2013. There is always about 18 months delay after the pub­li­ca­tion period to allow time for reflec­tion, cita­tions, etc. The prize is US$1000 plus an engraved plaque. Con­tinue reading →

IJF review papers

Review papers are extremely use­ful for new researchers such as PhD stu­dents, or when you want to learn about a new research field. The Inter­na­tional Jour­nal of Fore­cast­ing pro­duced a whole review issue in 2006, and it con­tains some of the most highly cited papers we have ever pub­lished. Now, begin­ning with the lat­est issue of the jour­nal, we have started pub­lish­ing occa­sional review arti­cles on selected areas of fore­cast­ing. The first two arti­cles are:

  1. Elec­tric­ity price fore­cast­ing: A review of the state-​​of-​​the-​​art with a look into the future by Rafał Weron.
  2. The chal­lenges of pre-​​launch fore­cast­ing of adop­tion time series for new durable prod­ucts by Paul Good­win, Sheik Meeran, and Karima Dyussekeneva.

Both tackle very impor­tant top­ics in fore­cast­ing. Weron’s paper con­tains a com­pre­hen­sive sur­vey of work on elec­tric­ity price fore­cast­ing, coher­ently bring­ing together a large body of diverse research — I think it is the longest paper I have ever approved at 50 pages. Good­win, Meeran and Dyussekeneva review research on new prod­uct fore­cast­ing, a prob­lem every com­pany that pro­duces goods or ser­vices has faced; when there are no his­tor­i­cal data avail­able, how do you fore­cast the sales of your product?

We have a few other review papers in progress, so keep an eye out for them in future issues.


biblatex for statisticians

I am now using bibla­tex for all my bib­li­o­graphic work as it seems to have devel­oped enough to be sta­ble and reli­able. The big advan­tage of bibla­tex is that it is easy to for­mat the bib­li­og­ra­phy to con­form to spe­cific jour­nal or pub­lisher styles. It is also pos­si­ble to have struc­tured bib­li­ogra­phies (e.g., divided into sec­tions: books, papers, R pack­ages, etc.) Con­tinue reading →

Varian on big data

Last week my research group dis­cussed Hal Varian’s inter­est­ing new paper on “Big data: new tricks for econo­met­rics”, Jour­nal of Eco­nomic Per­spec­tives, 28(2): 3–28.

It’s a nice intro­duc­tion to trees, bag­ging and forests, plus a very brief entrée to the LASSO and the elas­tic net, and to slab and spike regres­sion. Not enough to be able to use them, but ok if you’ve no idea what they are. Con­tinue reading →

To explain or predict?

Last week, my research group dis­cussed Galit Shmueli’s paper “To explain or to pre­dict?”, Sta­tis­ti­cal Sci­ence, 25(3), 289–310. (See her web­site for fur­ther mate­ri­als.) This is a paper every­one doing sta­tis­tics and econo­met­rics should read as it helps to clar­ify a dis­tinc­tion that is often blurred. In the dis­cus­sion, the fol­low­ing issues were cov­ered amongst other things.

  1. The AIC is bet­ter suited to model selec­tion for pre­dic­tion as it is asymp­tot­i­cally equiv­a­lent to leave-​​one-​​out cross-​​validation in regres­sion, or one-​​step-​​cross-​​validation in time series. On the other hand, it might be argued that the BIC is bet­ter suited to model selec­tion for expla­na­tion, as it is consistent.
  2. P-​​values are asso­ci­ated with expla­na­tion, not pre­dic­tion. It makes lit­tle sense to use p-​​values to deter­mine the vari­ables in a model that is being used for pre­dic­tion. (There are prob­lems in using p-​​values for vari­able selec­tion in any con­text, but that is a dif­fer­ent issue.)
  3. Mul­ti­collinear­ity has a very dif­fer­ent impact if your goal is pre­dic­tion from when your goal is esti­ma­tion. When pre­dict­ing, mul­ti­collinear­ity is not really a prob­lem pro­vided the val­ues of your pre­dic­tors lie within the hyper-​​region of the pre­dic­tors used when esti­mat­ing the model.
  4. An ARIMA model has no explana­tory use, but is great at short-​​term prediction.
  5. How to han­dle miss­ing val­ues in regres­sion is dif­fer­ent in a pre­dic­tive con­text com­pared to an explana­tory con­text. For exam­ple, when build­ing an explana­tory model, we could just use all the data for which we have com­plete obser­va­tions (assum­ing there is no sys­tem­atic nature to the miss­ing­ness). But when pre­dict­ing, you need to be able to pre­dict using what­ever data you have. So you might have to build sev­eral mod­els, with dif­fer­ent num­bers of pre­dic­tors, to allow for dif­fer­ent vari­ables being missing.
  6. Many sta­tis­tics and econo­met­rics text­books fail to observe these dis­tinc­tions. In fact, a lot of sta­tis­ti­cians and econo­me­tri­cians are trained only in the expla­na­tion par­a­digm, with pre­dic­tion an after­thought. That is unfor­tu­nate as most applied work these days requires pre­dic­tive mod­el­ling, rather than explana­tory modelling.



Great papers to read

My research group meets every two weeks. It is always fun to talk about gen­eral research issues and new tools and tips we have dis­cov­ered. We also use some of the time to dis­cuss a paper that I choose for them. Today we dis­cussed Breiman’s clas­sic (2001) two cul­tures paper — some­thing every sta­tis­ti­cian should read, includ­ing the discussion.

I select papers that I want every mem­ber of research team to be famil­iar with. Usu­ally they are clas­sics in fore­cast­ing, or they are recent sur­vey papers.

In the last cou­ple of months we have also read the fol­low­ing papers:

Past, present, and future of statistical science

This is the title of a won­der­ful new book that has just been released, cour­tesy of the Com­mit­tee of Pres­i­dents of Sta­tis­ti­cal Societies.

It can be freely down­loaded from the COPSS web­site or a hard copy can be pur­chased on Ama­zon (for only a lit­tle over 10c per page which is not bad com­pared to other sta­tis­tics books).

The book con­sists of 52 chap­ters span­ning 622 pages. The full table of con­tents below shows its scope and the list of authors (a ver­i­ta­ble who’s who in sta­tis­tics). Con­tinue reading →