A blog by Rob J Hyndman 

Twitter Gplus RSS

Posts Tagged ‘reproducible research’:

Errors on percentage errors

Published on 16 April 2014

The MAPE (mean absolute per­cent­age error) is a pop­u­lar mea­sure for fore­cast accu­racy and is defined as     where denotes an obser­va­tion and denotes its fore­cast, and the mean is taken over . Arm­strong (1985, p.348) was the first (to my knowl­edge) to point out the asym­me­try of the MAPE say­ing that “it has a bias favor­ing esti­mates that are below the actual values”.

9 Comments  comments 

Job at Center for Open Science

Published on 8 April 2014

This looks like an inter­est­ing job. Dear Dr. Hyn­d­man, I write from the Cen­ter for Open Sci­ence, a non-​​​​profit orga­ni­za­tion based in Char­lottesville, Vir­ginia in the United States, which is ded­i­cated to improv­ing the align­ment between sci­en­tific val­ues and sci­en­tific prac­tices. We are ded­i­cated to open source and open sci­ence. We are reach­ing out to you to find out if you know any­one who might be inter­ested in our Sta­tis­ti­cal and Method­olog­i­cal Con­sul­tant posi­tion. The posi­tion is a unique oppor­tu­nity to con­sult on repro­ducible best prac­tices in data analy­sis and research design; the con­sul­tant will make shorts vis­its to pro­vide lec­tures and train­ing at uni­ver­si­ties, lab­o­ra­to­ries, con­fer­ences, and through vir­tual medi­ums. An espe­cially unique part of the job involves col­lab­o­rat­ing with the White House’s Office of Sci­ence and Tech­nol­ogy Pol­icy on mat­ters relat­ing to repro­ducibil­ity. If you know some­one with sub­stan­tial train­ing and expe­ri­ence in sci­en­tific research, quan­ti­ta­tive meth­ods, repro­ducible research prac­tices, and some pro­gram­ming expe­ri­ence (at least R, ide­ally Python or Julia) might you please pass this along to them? Any­one may find out more about the job or apply via our web­site: http://​cen​ter​foropen​science​.org/​j​o​b​s​/​#​stats The posi­tion is full-​​​​time and located at our office in beau­ti­ful Char­lottesville, VA. Thanks in advance for your time


No Comments  comments 

More time series data online

Published on 27 February 2014

Ear­lier this week I had cof­fee with Ben Fulcher who told me about his online col­lec­tion com­pris­ing about 30,000 time series, mostly med­ical series such as ECG mea­sure­ments, mete­o­ro­log­i­cal series, bird­song, etc. There are some finance series, but not many other data from a busi­ness or eco­nomic con­text, although he does include my Time Series Data Library. In addi­tion, he pro­vides Mat­lab code to com­pute a large num­ber of char­ac­ter­is­tics. Any­one want­ing to test time series algo­rithms on a large col­lec­tion of data should take a look. Unfor­tu­nately there is no R code, and no R inter­face for down­load­ing the data.

No Comments  comments 

Computational Actuarial Science with R

Published on 3 February 2014

I recently co-​​​​authored a chap­ter on “Prospec­tive Life Tables” for this book, edited by Arthur Char­p­en­tier. R code to repro­duce the fig­ures and to com­plete the exer­cises for our chap­ter is now avail­able on github. Code for the other chap­ters should also be avail­able soon. The book can be pre-​​​​ordered on Amazon.

2 Comments  comments 

Reflections on UseR! 2013

Published on 13 July 2013

This week I’ve been at the R Users con­fer­ence in Albacete, Spain. These con­fer­ences are a lit­tle unusual in that they are not really about research, unlike most con­fer­ences I attend. They pro­vide a place for peo­ple to dis­cuss and exchange ideas on how R can be used. Here are some thoughts and high­lights of the con­fer­ence, in no par­tic­u­lar order.

1 Comment  comments 

SimpleR tips, tricks and tools

Published on 21 November 2012

I gave this talk last night to the Mel­bourne Users of R Network.

10 Comments  comments 

Makefiles for R/​LaTeX projects

Published on 31 October 2012

Updated: 21 Novem­ber 2012 Make is a mar­vel­lous tool used by pro­gram­mers to build soft­ware, but it can be used for much more than that. I use make when­ever I have a large project involv­ing R files and LaTeX files, which means I use it for almost all of the papers I write, and almost of the con­sult­ing reports I produce.

17 Comments  comments 


Published on 28 August 2012

This week I’m in Cyprus attend­ing the COMPSTAT2012 con­fer­ence. There’s been the usual inter­est­ing col­lec­tion of talks, and inter­ac­tions with other researchers. But I was struck by two side com­ments in talks this morn­ing that I’d like to men­tion. Stephen Pol­lock: Don’t imag­ine your model is the truth Actu­ally, Stephen said some­thing like “econ­o­mists (or was it econo­me­tri­cians?) have a bad habit of imag­in­ing their mod­els are true”. He gave the exam­ple of peo­ple ask­ing whether GDP “has a unit root”? GDP is an eco­nomic mea­sure­ment. It no more has a unit root than I do. But the mod­els used to approx­i­mate the dynam­ics of GDP may have a unit root. This is an exam­ple of con­fus­ing your data with your model. Or to put it the other way around, imag­in­ing that the model is true rather than an approx­i­ma­tion. A related thing that tends to annoy me is to refer to the model as the “data gen­er­at­ing process”. No model is a data gen­er­at­ing process, unless the data were obtained by sim­u­la­tion from the model. Mod­els are only ever approx­i­ma­tions, and imag­in­ing that they are data gen­er­at­ing processes only leads to over-​​​​confidence and bad sci­ence. Matías Salibián-​​​​Barrera: Make all your code pub­lic After giv­ing an inter­est­ing sur­vey of


1 Comment  comments 

How to avoid annoying a referee

Published on 22 October 2010

It’s not a good idea to annoy the ref­er­ees of your paper. They make rec­om­men­da­tions to the edi­tor about your work and it is best to keep them happy. There is an inter­est­ing dis­cus­sion on stats​.stack​ex​change​.com on this sub­ject. This inspired my own list below. Explain what you’ve done clearly, avoid­ing unnec­es­sary jar­gon. Don’t claim your paper con­tributes more than it actu­ally does. (I ref­er­eed a paper this week where the author claimed to have invented prin­ci­pal com­po­nent analy­sis!) Ensure all fig­ures have clear cap­tions and labels. Include cita­tions to the referee’s own work. Obvi­ously you don’t know who is going to ref­eree your paper, but you should aim to cite the main work in the area. It places your work in con­text, and keeps the ref­er­ees happy if they are the authors. Make sure the cited papers say what you think they say. Sight what you cite! Include proper cita­tions for all soft­ware pack­ages. If you are unsure how to cite an R pack­age, try the com­mand citation(“packagename”). Never pla­gia­rise from other papers — not even sen­tence frag­ments. Use your own words. I’ve ref­er­eed a the­sis which had slabs taken from my own lec­ture notes includ­ing the typos. Don’t pla­gia­rise from your own papers. Either ref­er­ence


2 Comments  comments 

Replications and reproducible research

Published on 2 December 2009

Repro­ducible research One of the best ways to get started with research in a new area is to try to repli­cate some exist­ing research. In doing so, you will usu­ally gain a much bet­ter under­stand­ing of the topic, and you will often dis­cover some prob­lems with the research, or develop ideas that will lead to a new research paper. Unfor­tu­nately, a lot of papers are not repro­ducible because the data are not made avail­able, or the descrip­tion of the meth­ods are not detailed enough. The good news is that there is a grow­ing move amongst fund­ing agen­cies and jour­nals to make more research repro­ducible.  Peng, Dominici and Zeger (2006) and Koenker and Zeileis (2009) pro­vide help­ful dis­cus­sions of new tools (espe­cially Sweave) for mak­ing research eas­ier to repro­duce. The Inter­na­tional Jour­nal of Fore­cast­ing is also encour­ag­ing researchers to make their data and com­puter code avail­able in order to allow oth­ers to repli­cate the research. I have just writ­ten an edi­to­r­ial on this topic which will appear in the first issue of 2010. Here is an excerpt from the arti­cle: As the lead­ing jour­nal in fore­cast­ing, the IJF has a respon­si­bil­ity to set research stan­dards. So, a cou­ple of years ago, we started ask­ing authors to make their data


3 Comments  comments