Earlier this week I had coffee with Ben Fulcher who told me about his online collection comprising about 30,000 time series, mostly medical series such as ECG measurements, meteorological series, birdsong, etc. There are some finance series, but not many other data from a business or economic context, although he does include my Time Series Data Library. In addition, he provides Matlab code to compute a large number of characteristics. Anyone wanting to test time series algorithms on a large collection of data should take a look. Unfortunately there is no R code, and no R interface for downloading the data.
Posts Tagged ‘reproducible research’:
I recently co-authored a chapter on “Prospective Life Tables” for this book, edited by Arthur Charpentier. R code to reproduce the figures and to complete the exercises for our chapter is now available on github. Code for the other chapters should also be available soon. The book can be pre-ordered on Amazon.
This week I’ve been at the R Users conference in Albacete, Spain. These conferences are a little unusual in that they are not really about research, unlike most conferences I attend. They provide a place for people to discuss and exchange ideas on how R can be used. Here are some thoughts and highlights of the conference, in no particular order.
I gave this talk last night to the Melbourne Users of R Network.
Updated: 21 November 2012 Make is a marvellous tool used by programmers to build software, but it can be used for much more than that. I use make whenever I have a large project involving R files and LaTeX files, which means I use it for almost all of the papers I write, and almost of the consulting reports I produce.
This week I’m in Cyprus attending the COMPSTAT2012 conference. There’s been the usual interesting collection of talks, and interactions with other researchers. But I was struck by two side comments in talks this morning that I’d like to mention. Stephen Pollock: Don’t imagine your model is the truth Actually, Stephen said something like “economists (or was it econometricians?) have a bad habit of imagining their models are true”. He gave the example of people asking whether GDP “has a unit root”? GDP is an economic measurement. It no more has a unit root than I do. But the models used to approximate the dynamics of GDP may have a unit root. This is an example of confusing your data with your model. Or to put it the other way around, imagining that the model is true rather than an approximation. A related thing that tends to annoy me is to refer to the model as the “data generating process”. No model is a data generating process, unless the data were obtained by simulation from the model. Models are only ever approximations, and imagining that they are data generating processes only leads to over-confidence and bad science. Matías Salibián-Barrera: Make all your code public After giving an interesting survey of
It’s not a good idea to annoy the referees of your paper. They make recommendations to the editor about your work and it is best to keep them happy. There is an interesting discussion on stats.stackexchange.com on this subject. This inspired my own list below. Explain what you’ve done clearly, avoiding unnecessary jargon. Don’t claim your paper contributes more than it actually does. (I refereed a paper this week where the author claimed to have invented principal component analysis!) Ensure all figures have clear captions and labels. Include citations to the referee’s own work. Obviously you don’t know who is going to referee your paper, but you should aim to cite the main work in the area. It places your work in context, and keeps the referees happy if they are the authors. Make sure the cited papers say what you think they say. Sight what you cite! Include proper citations for all software packages. If you are unsure how to cite an R package, try the command citation(“packagename”). Never plagiarise from other papers — not even sentence fragments. Use your own words. I’ve refereed a thesis which had slabs taken from my own lecture notes including the typos. Don’t plagiarise from your own papers. Either reference
Reproducible research One of the best ways to get started with research in a new area is to try to replicate some existing research. In doing so, you will usually gain a much better understanding of the topic, and you will often discover some problems with the research, or develop ideas that will lead to a new research paper. Unfortunately, a lot of papers are not reproducible because the data are not made available, or the description of the methods are not detailed enough. The good news is that there is a growing move amongst funding agencies and journals to make more research reproducible. Peng, Dominici and Zeger (2006) and Koenker and Zeileis (2009) provide helpful discussions of new tools (especially Sweave) for making research easier to reproduce. The International Journal of Forecasting is also encouraging researchers to make their data and computer code available in order to allow others to replicate the research. I have just written an editorial on this topic which will appear in the first issue of 2010. Here is an excerpt from the article: As the leading journal in forecasting, the IJF has a responsibility to set research standards. So, a couple of years ago, we started asking authors to make their data