Last week my research group discussed Hal Varian’s interesting new paper on “Big data: new tricks for econometrics”, Journal of Economic Perspectives, 28(2): 3–28. It’s a nice introduction to trees, bagging and forests, plus a very brief entrée to the LASSO and the elastic net, and to slab and spike regression. Not enough to be able to use them, but ok if you’ve no idea what they are.
Posts Tagged ‘references’:
Last week, my research group discussed Galit Shmueli’s paper “To explain or to predict?”, Statistical Science, 25(3), 289–310. (See her website for further materials.) This is a paper everyone doing statistics and econometrics should read as it helps to clarify a distinction that is often blurred. In the discussion, the following issues were covered amongst other things. The AIC is better suited to model selection for prediction as it is asymptotically equivalent to leave-one-out cross-validation in regression, or one-step-cross-validation in time series. On the other hand, it might be argued that the BIC is better suited to model selection for explanation, as it is consistent. P-values are associated with explanation, not prediction. It makes little sense to use p-values to determine the variables in a model that is being used for prediction. (There are problems in using p-values for variable selection in any context, but that is a different issue.) Multicollinearity has a very different impact if your goal is prediction from when your goal is estimation. When predicting, multicollinearity is not really a problem provided the values of your predictors lie within the hyper-region of the predictors used when estimating the model. An ARIMA model has no explanatory use, but is great at short-term prediction. How to
My research group meets every two weeks. It is always fun to talk about general research issues and new tools and tips we have discovered. We also use some of the time to discuss a paper that I choose for them. Today we discussed Breiman’s classic (2001) two cultures paper — something every statistician should read, including the discussion. I select papers that I want every member of research team to be familiar with. Usually they are classics in forecasting, or they are recent survey papers. In the last couple of months we have also read the following papers: Timmermann (2008) Elusive return predictability Diebold (2013) Comparing predictive accuracy, twenty years later: A personal perspective on the use and abuse of Diebold-Mariano tests Gneiting and Katsfuss (2014) Probabilistic forecasting Makridakis and Hibon (1978) Accuracy of forecasting: an empirical investigation
This is the title of a wonderful new book that has just been released, courtesy of the Committee of Presidents of Statistical Societies. It can be freely downloaded from the COPSS website or a hard copy can be purchased on Amazon (for only a little over 10c per page which is not bad compared to other statistics books). The book consists of 52 chapters spanning 622 pages. The full table of contents below shows its scope and the list of authors (a veritable who’s who in statistics).
The MAPE (mean absolute percentage error) is a popular measure for forecast accuracy and is defined as where denotes an observation and denotes its forecast, and the mean is taken over . Armstrong (1985, p.348) was the first (to my knowledge) to point out the asymmetry of the MAPE saying that “it has a bias favoring estimates that are below the actual values”.
For all those people asking me how to obtain a print version of my book “Forecasting: principles and practice” with George Athanasopoulos, you now can. Order on Amazon.com Order on Amazon.co.uk Order on Amazon.fr The online book will continue to be freely available. The print version of the book is intended to help fund the development of the OTexts platform. The price is US195 for my previous forecasting textbook, 182 for Gonzalez-Rivera. No matter how good the books are, the prices are absurdly high. OTexts is intended to be a different kind of publisher — all our books are online and free, those in print will be reasonably priced. The online version will continue to be updated regularly. The print version is a snapshot of the online version today. We will release a new print edition occasionally, no more than annually and only when the online version has changed enough to warrant a new print edition. We are planning an offline electronic version as well. I’ll announce it here when it is ready.
Every year or so, Elsevier asks me to nominate five International Journal of Forecasting papers from the last two years to highlight in their marketing materials as “Editor’s Choice”. I try to select papers across a broad range of subjects, and I take into account citations and downloads as well as my own impression of the paper. That tends to bias my selection a little towards older papers as they have had more time to accumulate citations. Here are the papers I chose this morning (in the order they appeared): Diebold and Yilmaz (2012) Better to give than to receive: Predictive directional measurement of volatility spillovers. IJF 28(1), 57–66. Loterman, Brown, Martens, Mues, and Baesens (2012) Benchmarking regression algorithms for loss given default modeling. IJF 28(1), 161–170. Soyer and Hogarth (2012) The illusion of predictability: How regression statistics mislead experts. IJF 28(3), 695–711. Friedman (2012) Fast sparse regression and classification. IJF 28(3), 722–738. Davydenko and Fildes (2013) Measuring forecasting accuracy: The case of judgmental adjustments to SKU-level demand forecasts. IJF 29(3), 510–522. Last time I did this, three of the five papers I chose went on to win awards. (I don’t pick the award winners — that’s a matter for the whole editorial board.) On the other hand, I didn’t pick the
In two weeks I am presenting a workshop at the University of Granada (Spain) on Automatic Time Series Forecasting. Unlike most of my talks, this is not intended to be primarily about my own research. Rather it is to provide a state-of-the-art overview of the topic (at a level suitable for Masters students in Computer Science). I thought I’d provide some historical perspective on the development of automatic time series forecasting, plus give some comments on the current best practices.
Hastie, Tibshirani and Friedman’s Elements of Statistical Learning first appeared in 2001 and is already a classic. It is my go-to book when I need a quick refresher on a machine learning algorithm. I like it because it is written using the language and perspective of statistics, and provides a very useful entry point into the literature of machine learning which has its own terminology for statistical concepts. A free downloadable pdf version is available on the website. Recently, a simpler related book appeared entitled Introduction to Statistical Learning with applications in R by James, Witten, Hastie and Tibshirani. It “is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences”. This would be a great textbook for our new 3rd year subject on Business Analytics. The R code is a welcome addition in showing how to implement the methods. Again, a free downloadable pdf version is available on the website. There is also a new, free book on Statistical foundations of machine learning by Böntempi and Ben Taieb available on the OTexts platform. This is more of a handbook and is written by two authors coming from a machine learning background. R code is also provided. Being an OTexts book, it is continually updated and revised, and is freely available
The publishing platform I set up for my forecasting book has now been extended to cover more books and greater functionality. Check it out at www.otexts.org.