Some new websites are being established offering “market places” for data science. Two I’ve come across recently are Experfy and SnapAnalytx. (more…)
I’m tired of reading about tests for structural breaks and here’s why.
A structural break occurs when we see a sudden change in a time series or a relationship between two time series. Econometricians love papers on structural breaks, and apparently believe in them. Personally, I tend to take a different view of the world. I think a more realistic view is that most things change slowly over time, and only occasionally with sudden discontinuous change. (more…)
Last week, my research group discussed Galit Shmueli’s paper “To explain or to predict?”, Statistical Science, 25(3), 289–310. (See her website for further materials.) This is a paper everyone doing statistics and econometrics should read as it helps to clarify a distinction that is often blurred. In the discussion, the following issues were covered amongst other things.
- The AIC is better suited to model selection for prediction as it is asymptotically equivalent to leave-one-out cross-validation in regression, or one-step-cross-validation in time series. On the other hand, it might be argued that the BIC is better suited to model selection for explanation, as it is consistent.
- P-values are associated with explanation, not prediction. It makes little sense to use p-values to determine the variables in a model that is being used for prediction. (There are problems in using p-values for variable selection in any context, but that is a different issue.)
- Multicollinearity has a very different impact if your goal is prediction from when your goal is estimation. When predicting, multicollinearity is not really a problem provided the values of your predictors lie within the hyper-region of the predictors used when estimating the model.
- An ARIMA model has no explanatory use, but is great at short-term prediction.
- How to handle missing values in regression is different in a predictive context compared to an explanatory context. For example, when building an explanatory model, we could just use all the data for which we have complete observations (assuming there is no systematic nature to the missingness). But when predicting, you need to be able to predict using whatever data you have. So you might have to build several models, with different numbers of predictors, to allow for different variables being missing.
- Many statistics and econometrics textbooks fail to observe these distinctions. In fact, a lot of statisticians and econometricians are trained only in the explanation paradigm, with prediction an afterthought. That is unfortunate as most applied work these days requires predictive modelling, rather than explanatory modelling.
Today’s email question:
I work within a government budget office and sometimes have to forecast fairly simple time series several quarters into the future. Auto.arima() works great and I often get something along the lines of: ARIMA(0,0,1)(1,1,0) with drift as the lowest AICc.
However, my boss (who does not use R) takes issue with low-order AR and MA because “you’re essentially using forecasted data to make your forecast.” His models include AR(10) MA(12)s etc. rather frequently. I argue that’s overfitting. I don’t see a great deal of discussion in textbooks about this, and I’ve never seen such higher-order models in a textbook setting. But are they fairly common in practice? What concerns could I raise with him about higher-order models? Any advice you could give would be appreciated.
We have an exciting new initiative at Monash University with some new positions in business analytics. This is part of a plan to strengthen our research and teaching in the data science/computational statistics area. We are hoping to make multiple appointments, at junior and senior levels. These are five-year appointments, but we hope that the positions will continue after that if we can secure suitable funding. (more…)
My research group meets every two weeks. It is always fun to talk about general research issues and new tools and tips we have discovered. We also use some of the time to discuss a paper that I choose for them. Today we discussed Breiman’s classic (2001) two cultures paper — something every statistician should read, including the discussion.
I select papers that I want every member of research team to be familiar with. Usually they are classics in forecasting, or they are recent survey papers.
In the last couple of months we have also read the following papers:
- Timmermann (2008) Elusive return predictability
- Diebold (2013) Comparing predictive accuracy, twenty years later: A personal perspective on the use and abuse of Diebold-Mariano tests
- Gneiting and Katsfuss (2014) Probabilistic forecasting
- Makridakis and Hibon (1978) Accuracy of forecasting: an empirical investigation
This is the title of a wonderful new book that has just been released, courtesy of the Committee of Presidents of Statistical Societies.
The book consists of 52 chapters spanning 622 pages. The full table of contents below shows its scope and the list of authors (a veritable who’s who in statistics). (more…)
I’ve been an editor of JSS for the last few years, and as a result I tend to get email from people asking me about publishing papers describing R packages in JSS. So for all those wondering, here are some general comments. (more…)
There are several other blogs on forecasting that readers might be interested in. Here are seven worth following:
- No Hesitations by Francis Diebold (Professor of Economics, University of Pennsylvania). Diebold needs no introduction to forecasters. He primarily covers forecasting in economics and finance, but also xkcd cartoons, graphics, research issues, etc.
- Econometrics Beat by Dave Giles. Dave is a professor of economics at the University of Victoria (Canada), formerly from my own department at Monash University (Australia), and a native New Zealander. Not a lot on forecasting, but plenty of interesting posts about econometrics and statistics more generally.
- Business forecasting by Clive Jones (a professional forecaster based in Colorado, USA). Originally about sales and new product forecasting, but he now covers a lot of other forecasting topics and has an interesting practitioner perspective.
- Freakonometrics: by Arthur Charpentier (an actuary and professor of mathematics at the University of Quebec at Montreal, Canada). This is the most prolific blog on this list. Wide ranging and taking in statistics, forecasting, econometrics, actuarial science, R, and anything else that takes his fancy. Sometimes in French.
- No free hunch: the kaggle blog. Some of the most interesting posts are from kaggle competition winners explaining their methods.
- Energy forecasting by Tao Hong (formerly an energy forecaster for SAS, now a professor at UNC). He covers mostly energy forecasting issues and job postings.
- The official IIF blog. Conferences, jobs, member profiles, etc.