The following papers have been nominated for the best paper published in the *International Journal of Forecasting* in 2012–2013. I have included an excerpt from the nomination in each case. The papers in bold have been short-listed for the award, and the editorial board are currently voting on them. Continue reading →

# Tag / references

# Paperpile makes me more productive

One of the first things I tell my new research students is to use a reference management system to help them keep track of the papers they read, and to assist in creating bib files for their bibliography. Most of them use Mendeley, one or two use Zotero. Both do a good job and both are free.

I use neither. I did use Mendeley for several years (and blogged about it a few years ago), but it became slower and slower to sync as my reference collection grew. Eventually it simply couldn’t handle the load. I have over 11,000 papers in my collection of papers, and I was spending several minutes every day waiting for Mendeley just to update the database.

Then I came across **Paperpile**, which is not so well known as some of its competitors, but it is truly awesome. I’ve now been using it for over a year, and I have grown to depend on it every day to keep track of all the papers I read, and to create my bib files. Continue reading →

# What to cite?

This question comes from a comment on another post:

I’ve seen authors citing as many references as possible to try to please potential referees. Many of those references are low quality papers though. Any general guidance about a typical length for the reference section?

It depends on the subject and style of the paper. I’ve written a paper with over 900 citations, but that was a review of time series forecasting over a 25 year period, and so it had to include a lot of references.

I’ve also written a paper with just four citations. As it was a commentary, it did not need a lot of contextual information.

Rather than provide guidance on the length of the reference section, I think it is better to follow some general principles of citation in research. Continue reading →

# Nominations for best International Journal of Forecasting paper, 2012-2013

Every two years, the *International Journal of Forecasting* awards a prize for the best paper published in a two year period. It is now time to identify the best paper published in the IJF during 2012 and 2013. There is always about 18 months delay after the publication period to allow time for reflection, citations, etc. The prize is US$1000 plus an engraved plaque. Continue reading →

# RSS feeds for statistics and related journals

I’ve now resurrected the collection of research journals that I follow, and set it up as a shared collection in feedly. So anyone can easily subscribe to all of the same journals, or select a subset of them, to follow on feedly. Continue reading →

# IJF review papers

Review papers are extremely useful for new researchers such as PhD students, or when you want to learn about a new research field. The *International Journal of Forecasting* produced a whole review issue in 2006, and it contains some of the most highly cited papers we have ever published. Now, beginning with the latest issue of the journal, we have started publishing occasional review articles on selected areas of forecasting. The first two articles are:

- Electricity price forecasting: A review of the state-of-the-art with a look into the future by Rafał Weron.
- The challenges of pre-launch forecasting of adoption time series for new durable products by Paul Goodwin, Sheik Meeran, and Karima Dyussekeneva.

Both tackle very important topics in forecasting. Weron’s paper contains a comprehensive survey of work on electricity price forecasting, coherently bringing together a large body of diverse research — I think it is the longest paper I have ever approved at 50 pages. Goodwin, Meeran and Dyussekeneva review research on new product forecasting, a problem every company that produces goods or services has faced; when there are no historical data available, how do you forecast the sales of your product?

We have a few other review papers in progress, so keep an eye out for them in future issues.

# biblatex for statisticians

I am now using biblatex for all my bibliographic work as it seems to have developed enough to be stable and reliable. The big advantage of biblatex is that it is easy to format the bibliography to conform to specific journal or publisher styles. It is also possible to have structured bibliographies (e.g., divided into sections: books, papers, R packages, etc.) Continue reading →

# Varian on big data

Last week my research group discussed Hal Varian’s interesting new paper on “Big data: new tricks for econometrics”, *Journal of Economic Perspectives*, **28**(2): 3–28.

It’s a nice introduction to trees, bagging and forests, plus a very brief entrée to the LASSO and the elastic net, and to slab and spike regression. Not enough to be able to use them, but ok if you’ve no idea what they are. Continue reading →

# To explain or predict?

Last week, my research group discussed Galit Shmueli’s paper “To explain or to predict?”, *Statistical Science*, **25**(3), 289–310. (See her website for further materials.) This is a paper everyone doing statistics and econometrics should read as it helps to clarify a distinction that is often blurred. In the discussion, the following issues were covered amongst other things.

- The AIC is better suited to model selection for prediction as it is asymptotically equivalent to leave-one-out cross-validation in regression, or one-step-cross-validation in time series. On the other hand, it might be argued that the BIC is better suited to model selection for explanation, as it is consistent.
- P-values are associated with explanation, not prediction. It makes little sense to use p-values to determine the variables in a model that is being used for prediction. (There are problems in using p-values for variable selection in any context, but that is a different issue.)
- Multicollinearity has a very different impact if your goal is prediction from when your goal is estimation. When predicting, multicollinearity is not really a problem provided the values of your predictors lie within the hyper-region of the predictors used when estimating the model.
- An ARIMA model has no explanatory use, but is great at short-term prediction.
- How to handle missing values in regression is different in a predictive context compared to an explanatory context. For example, when building an explanatory model, we could just use all the data for which we have complete observations (assuming there is no systematic nature to the missingness). But when predicting, you need to be able to predict using whatever data you have. So you might have to build several models, with different numbers of predictors, to allow for different variables being missing.
- Many statistics and econometrics textbooks fail to observe these distinctions. In fact, a lot of statisticians and econometricians are trained only in the explanation paradigm, with prediction an afterthought. That is unfortunate as most applied work these days requires predictive modelling, rather than explanatory modelling.

# Great papers to read

My research group meets every two weeks. It is always fun to talk about general research issues and new tools and tips we have discovered. We also use some of the time to discuss a paper that I choose for them. Today we discussed Breiman’s classic (2001) two cultures paper — something every statistician should read, including the discussion.

I select papers that I want every member of research team to be familiar with. Usually they are classics in forecasting, or they are recent survey papers.

In the last couple of months we have also read the following papers:

- Timmermann (2008) Elusive return predictability
- Diebold (2013) Comparing predictive accuracy, twenty years later: A personal perspective on the use and abuse of Diebold-Mariano tests
- Gneiting and Katsfuss (2014) Probabilistic forecasting
- Makridakis and Hibon (1978) Accuracy of forecasting: an empirical investigation