I’ve now resurrected the collection of research journals that I follow, and set it up as a shared collection in feedly. So anyone can easily subscribe to all of the same journals, or select a subset of them, to follow on feedly. Continue reading →

# Tag / references

# IJF review papers

Review papers are extremely useful for new researchers such as PhD students, or when you want to learn about a new research field. The *International Journal of Forecasting* produced a whole review issue in 2006, and it contains some of the most highly cited papers we have ever published. Now, beginning with the latest issue of the journal, we have started publishing occasional review articles on selected areas of forecasting. The first two articles are:

- Electricity price forecasting: A review of the state-of-the-art with a look into the future by Rafał Weron.
- The challenges of pre-launch forecasting of adoption time series for new durable products by Paul Goodwin, Sheik Meeran, and Karima Dyussekeneva.

Both tackle very important topics in forecasting. Weron’s paper contains a comprehensive survey of work on electricity price forecasting, coherently bringing together a large body of diverse research — I think it is the longest paper I have ever approved at 50 pages. Goodwin, Meeran and Dyussekeneva review research on new product forecasting, a problem every company that produces goods or services has faced; when there are no historical data available, how do you forecast the sales of your product?

We have a few other review papers in progress, so keep an eye out for them in future issues.

# biblatex for statisticians

I am now using biblatex for all my bibliographic work as it seems to have developed enough to be stable and reliable. The big advantage of biblatex is that it is easy to format the bibliography to conform to specific journal or publisher styles. It is also possible to have structured bibliographies (e.g., divided into sections: books, papers, R packages, etc.) Continue reading →

# Varian on big data

Last week my research group discussed Hal Varian’s interesting new paper on “Big data: new tricks for econometrics”, *Journal of Economic Perspectives*, **28**(2): 3–28.

It’s a nice introduction to trees, bagging and forests, plus a very brief entrée to the LASSO and the elastic net, and to slab and spike regression. Not enough to be able to use them, but ok if you’ve no idea what they are. Continue reading →

# To explain or predict?

Last week, my research group discussed Galit Shmueli’s paper “To explain or to predict?”, *Statistical Science*, **25**(3), 289–310. (See her website for further materials.) This is a paper everyone doing statistics and econometrics should read as it helps to clarify a distinction that is often blurred. In the discussion, the following issues were covered amongst other things.

- The AIC is better suited to model selection for prediction as it is asymptotically equivalent to leave-one-out cross-validation in regression, or one-step-cross-validation in time series. On the other hand, it might be argued that the BIC is better suited to model selection for explanation, as it is consistent.
- P-values are associated with explanation, not prediction. It makes little sense to use p-values to determine the variables in a model that is being used for prediction. (There are problems in using p-values for variable selection in any context, but that is a different issue.)
- Multicollinearity has a very different impact if your goal is prediction from when your goal is estimation. When predicting, multicollinearity is not really a problem provided the values of your predictors lie within the hyper-region of the predictors used when estimating the model.
- An ARIMA model has no explanatory use, but is great at short-term prediction.
- How to handle missing values in regression is different in a predictive context compared to an explanatory context. For example, when building an explanatory model, we could just use all the data for which we have complete observations (assuming there is no systematic nature to the missingness). But when predicting, you need to be able to predict using whatever data you have. So you might have to build several models, with different numbers of predictors, to allow for different variables being missing.
- Many statistics and econometrics textbooks fail to observe these distinctions. In fact, a lot of statisticians and econometricians are trained only in the explanation paradigm, with prediction an afterthought. That is unfortunate as most applied work these days requires predictive modelling, rather than explanatory modelling.

# Great papers to read

My research group meets every two weeks. It is always fun to talk about general research issues and new tools and tips we have discovered. We also use some of the time to discuss a paper that I choose for them. Today we discussed Breiman’s classic (2001) two cultures paper — something every statistician should read, including the discussion.

I select papers that I want every member of research team to be familiar with. Usually they are classics in forecasting, or they are recent survey papers.

In the last couple of months we have also read the following papers:

- Timmermann (2008) Elusive return predictability
- Diebold (2013) Comparing predictive accuracy, twenty years later: A personal perspective on the use and abuse of Diebold-Mariano tests
- Gneiting and Katsfuss (2014) Probabilistic forecasting
- Makridakis and Hibon (1978) Accuracy of forecasting: an empirical investigation

# Past, present, and future of statistical science

This is the title of a wonderful new book that has just been released, courtesy of the Committee of Presidents of Statistical Societies.

It can be freely downloaded from the COPSS website or a hard copy can be purchased on Amazon (for only a little over 10c per page which is not bad compared to other statistics books).

The book consists of 52 chapters spanning 622 pages. The full table of contents below shows its scope and the list of authors (a veritable who’s who in statistics). Continue reading →

# Errors on percentage errors

The MAPE (mean absolute percentage error) is a popular measure for forecast accuracy and is defined as

where denotes an observation and denotes its forecast, and the mean is taken over .

Armstrong (1985, p.348) was the first (to my knowledge) to point out the asymmetry of the MAPE saying that “it has a bias favoring estimates that are below the actual values”. Continue reading →

# My forecasting book now on Amazon

For all those people asking me how to obtain a print version of my book “Forecasting: principles and practice” with George Athanasopoulos, you now can.

The online book will continue to be freely available. The print version of the book is intended to help fund the development of the OTexts platform.

The price is US$45, £27 or €35.

Compare that to $195 for my previous forecasting textbook, $150 for Fildes and Ord, or $182 for Gonzalez-Rivera. No matter how good the books are, the prices are absurdly high.

OTexts is intended to be a different kind of publisher — all our books are online and free, those in print will be reasonably priced.

The online version will continue to be updated regularly. The print version is a snapshot of the online version today. We will release a new print edition occasionally, no more than annually and only when the online version has changed enough to warrant a new print edition.

We are planning an offline electronic version as well. I’ll announce it here when it is ready.

# Top papers in the International Journal of Forecasting

Every year or so, Elsevier asks me to nominate five *International Journal of Forecasting* papers from the last two years to highlight in their marketing materials as “Editor’s Choice”. I try to select papers across a broad range of subjects, and I take into account citations and downloads as well as my own impression of the paper. That tends to bias my selection a little towards older papers as they have had more time to accumulate citations. Here are the papers I chose this morning (in the order they appeared):

- Diebold and Yilmaz (2012) Better to give than to receive: Predictive directional measurement of volatility spillovers.
*IJF*28(1), 57–66. - Loterman, Brown, Martens, Mues, and Baesens (2012) Benchmarking regression algorithms for loss given default modeling.
*IJF*28(1), 161–170. - Soyer and Hogarth (2012) The illusion of predictability: How regression statistics mislead experts.
*IJF*28(3), 695–711. - Friedman (2012) Fast sparse regression and classification.
*IJF*28(3), 722–738. - Davydenko and Fildes (2013) Measuring forecasting accuracy: The case of judgmental adjustments to SKU-level demand forecasts.
*IJF*29(3), 510–522.

Last time I did this, three of the five papers I chose went on to win awards. (I don’t pick the award winners — that’s a matter for the whole editorial board.) On the other hand, I didn’t pick the paper that got the top award for the period 2010–2011. So perhaps my selection is not such a good guide.