I’m delighted that Professor Dianne Cook will be joining Monash University in July 2015 as a Professor of Business Analytics. Di is an Australian who has worked in the US for the past 25 years, mostly at Iowa State University. She is moving back to Australia and joining the Department of Econometrics and Business Statistics in the Monash Business School, as part of our initiative in Business Analytics.
Di is a world leader in data visualization, and is well-known for her work on interactive graphics. She is also the academic supervisor of several leading data scientists including Hadley Wickham and Yihui Xie, both of whom work for RStudio.
Di has a great deal of energy and enthusiasm for computational statistics and data visualization, and will play a key role in developing and teaching our new subjects in business analytics.
The Monash Business School is already exceptionally strong in econometrics (ranked 7th in the world on RePEc), and forecasting (ranked 11th on RePEc), and we have recently expanded into actuarial science. With Di joining the department, we will be extending our expertise in the area of data visualization as well.
Souhaib Ben Taieb has been awarded his doctorate at the Université libre de Bruxelles and so he is now officially Dr Ben Taieb! Although Souhaib lives in Brussels, and was a student at the Université libre de Bruxelles, I co-supervised his doctorate (along with Professor Gianluca Bontempi). Souhaib is the 19th PhD student of mine to graduate.
His thesis was on “Machine learning strategies for multi-step-ahead time series forecasting” and is now available online. The prior research in this area has largely centred around two strategies (recursive and direct), and which one works better in certain circumstances. Recursive forecasting is the standard approach where a model is designed to predict one step ahead, and is then iterated to obtain multi-step-ahead forecasts. Direct forecasting involves using a separate forecasting model for each forecast horizon. Souhaib took a very different perspective from the prior research and has developed new strategies that are either hybrids of these two strategies, or completely different from either of them. The resulting forecasts are often significantly better than those obtained using the more traditional approaches.
Some of the papers to come out of Souhaib’s thesis are already available on his Google scholar page.
Well done Souhaib, and best wishes for the future.
Next week, Professor Di Cook from Iowa State University is visiting my research group at Monash University. Di is a world leader in data visualization, and is especially well-known for her work on interactive graphics and the XGobi and GGobi software. See her book with Deb Swayne for details.
For those wanting to hear her speak, read on. Continue reading →
Last week my research group discussed Hal Varian’s interesting new paper on “Big data: new tricks for econometrics”, Journal of Economic Perspectives, 28(2): 3–28.
It’s a nice introduction to trees, bagging and forests, plus a very brief entrée to the LASSO and the elastic net, and to slab and spike regression. Not enough to be able to use them, but ok if you’ve no idea what they are. Continue reading →
Last week, my research group discussed Galit Shmueli’s paper “To explain or to predict?”, Statistical Science, 25(3), 289–310. (See her website for further materials.) This is a paper everyone doing statistics and econometrics should read as it helps to clarify a distinction that is often blurred. In the discussion, the following issues were covered amongst other things.
- The AIC is better suited to model selection for prediction as it is asymptotically equivalent to leave-one-out cross-validation in regression, or one-step-cross-validation in time series. On the other hand, it might be argued that the BIC is better suited to model selection for explanation, as it is consistent.
- P-values are associated with explanation, not prediction. It makes little sense to use p-values to determine the variables in a model that is being used for prediction. (There are problems in using p-values for variable selection in any context, but that is a different issue.)
- Multicollinearity has a very different impact if your goal is prediction from when your goal is estimation. When predicting, multicollinearity is not really a problem provided the values of your predictors lie within the hyper-region of the predictors used when estimating the model.
- An ARIMA model has no explanatory use, but is great at short-term prediction.
- How to handle missing values in regression is different in a predictive context compared to an explanatory context. For example, when building an explanatory model, we could just use all the data for which we have complete observations (assuming there is no systematic nature to the missingness). But when predicting, you need to be able to predict using whatever data you have. So you might have to build several models, with different numbers of predictors, to allow for different variables being missing.
- Many statistics and econometrics textbooks fail to observe these distinctions. In fact, a lot of statisticians and econometricians are trained only in the explanation paradigm, with prediction an afterthought. That is unfortunate as most applied work these days requires predictive modelling, rather than explanatory modelling.
My research group meets every two weeks. It is always fun to talk about general research issues and new tools and tips we have discovered. We also use some of the time to discuss a paper that I choose for them. Today we discussed Breiman’s classic (2001) two cultures paper — something every statistician should read, including the discussion.
I select papers that I want every member of research team to be familiar with. Usually they are classics in forecasting, or they are recent survey papers.
In the last couple of months we have also read the following papers:
We are looking for a new post-doctoral research fellow to work on the project “Macroeconomic Forecasting in a Big Data World”. Details are given at the link below
This is a two year position, funded by the Australian Research Council, and working with me, George Athanasopoulos, Farshid Vahid and Anastasios Panagiotelis. We are looking for someone with a PhD in econometrics, statistics or machine learning, who is well-trained in computationally intensive methods, and who has a background in at least one of time series analysis, macroeconomic modelling, or Bayesian econometrics.
If you find this blog helpful (or even if you don’t but you’re interested in blogs on research issues and tools), there are a few other blogs about doing research that you might find useful. Here are a few that I read.
I’ve created a bundle so you can subscribe to all of these in one go.
Of course, there are lots of statistics blogs as well, and blogs about other research disciplines. The ones above are those that concentrate on generic research issues.
Journal Clubs are a great way to learn new research ideas and to keep up with the literature. The idea is that a group of people get together every week or so to discuss a paper of joint interest. This can happen within your own research group or department, or virtually online.
There is now a virtual journal club operating in conjunction with CrossValidated.com. The first paper discussed was on text data mining. It appears that the next paper may be on collaborative filtering.
The emphasis is on Open Access papers, preferably with associated software that is freely available. Some of the discussion tends to centre on how to implement the ideas in R.
For those of us in Australia, the timing is tricky. The first discussion took place at 3am local time!
If you can’t make the CrossValidated Journal Club chats, why not start your own local club?
Today I gave a workshop for supervisors of postgraduate students. Mostly I talked about creating a team environment for postgraduate students rather than the traditional model (at least in statistics and econometrics) of each student working in isolation.
The slides are available here in presentation form or in handout form. Actually, these are an edited version of the slides as I accidentally left out a couple of the photographs in the workshop, and I’ve omitted slides that I didn’t end up covering in the workshop.
An important part of my research group is this blog. So if you haven’t been here before, please take a look around.
For those people who attended, feel free to add comments below if you would like to provide feedback on the workshop.