This week I’m in Cyprus attend­ing the COMPSTAT2012 con­fer­ence. There’s been the usual inter­est­ing col­lec­tion of talks, and inter­ac­tions with other researchers. But I was struck by two side com­ments in talks this morn­ing that I’d like to mention.

Stephen Pollock: Don't imagine your model is the truth

Actu­ally, Stephen said some­thing like “econ­o­mists (or was it econo­me­tri­cians?) have a bad habit of imag­in­ing their mod­els are true”. He gave the exam­ple of peo­ple ask­ing whether GDP “has a unit root”? GDP is an eco­nomic mea­sure­ment. It no more has a unit root than I do. But the mod­els used to approx­i­mate the dynam­ics of GDP may have a unit root. This is an exam­ple of con­fus­ing your data with your model. Or to put it the other way around, imag­in­ing that the model is true rather than an approx­i­ma­tion. A related thing that tends to annoy me is to refer to the model as the “data gen­er­at­ing process”. No model is a data gen­er­at­ing process, unless the data were obtained by sim­u­la­tion from the model. Mod­els are only ever approx­i­ma­tions, and imag­in­ing that they are data gen­er­at­ing processes only leads to over-​​confidence and bad science.

Matías Salibián-Barrera: Make all your code public

After giv­ing an inter­est­ing sur­vey of the robust­base and rrcov pack­ages for R, Matías spent the last ten min­utes of his talk pre­sent­ing the case for repro­ducible research and argu­ing for mak­ing R code pub­lic as much as pos­si­ble.  The ben­e­fits of mak­ing our code pub­lic are obvious:

  • The research can be repro­duced and checked by oth­ers. This is sim­ply good science.
  • Your work will be cited more fre­quently. Other researchers are much less likely to refer to your work if they have to imple­ment your meth­ods them­selves. But if you make it easy, then peo­ple will use your meth­ods and con­se­quently cite your papers.

He also said some­thing like this: “Don’t wait until jour­nals require you to sub­mit code and data; start now by putting your code and data on a web­site.” I agree. Every method­olog­i­cal paper should have an R pack­age as a com­ple­ment.  If that’s too much work, at least put some code on a web­site so that other peo­ple can imple­ment your method. What’s the point of hid­ing your code? In some ways, the code is more impor­tant than the accom­pa­ny­ing pack­age as it rep­re­sents a pre­cise descrip­tion of the method whereas the writ­ten paper may not include all the nec­es­sary details.

Related Posts:

  • zbi­cy­clist

    No model is a data gen­er­at­ing process” — right. The model is an esti­mate gen­er­at­ing process.