I’ve now resurrected the collection of research journals that I follow, and set it up as a shared collection in feedly. So anyone can easily subscribe to all of the same journals, or select a subset of them, to follow on feedly. Continue reading →

# Seminars in Taiwan

I’m currently visiting Taiwan and I’m giving two seminars while I’m here — one at the National Tsing Hua University in Hsinchu, and the other at Academia Sinica in Taipei. Details are below for those who might be nearby. Continue reading →

# Di Cook is moving to Monash

I’m delighted that Professor Dianne Cook will be joining Monash University in July 2015 as a Professor of Business Analytics. Di is an Australian who has worked in the US for the past 25 years, mostly at Iowa State University. She is moving back to Australia and joining the Department of Econometrics and Business Statistics in the Monash Business School, as part of our initiative in Business Analytics.

Di is a world leader in data visualization, and is well-known for her work on interactive graphics. She is also the academic supervisor of several leading data scientists including Hadley Wickham and Yihui Xie, both of whom work for RStudio.

Di has a great deal of energy and enthusiasm for computational statistics and data visualization, and will play a key role in developing and teaching our new subjects in business analytics.

The Monash Business School is already exceptionally strong in econometrics (ranked 7th in the world on RePEc), and forecasting (ranked 11th on RePEc), and we have recently expanded into actuarial science. With Di joining the department, we will be extending our expertise in the area of data visualization as well.

# New R package for electricity forecasting

Shu Fan and I have developed a model for electricity demand forecasting that is now widely used in Australia for long-term forecasting of peak electricity demand. It has become known as the “Monash Electricity Forecasting Model”. We have decided to release an R package that implements our model so that other people can easily use it. The package is called “MEFM” and is available on github. We will probably also put in on CRAN eventually.

The model was first described in Hyndman and Fan (2010). We are continually improving it, and the latest version is decribed in the model documentation which will be updated from time to time.

The package is being released under a GPL licence, so anyone can use it. All we ask is that our work is properly cited.

Naturally, we are not able to provide free technical support, although we welcome bug reports. We are available to undertake paid consulting work in electricity forecasting.

# A time series classification contest

Amongst today’s email was one from someone running a private competition to classify time series. Here are the essential details.

The data are measurements from a medical diagnostic machine which takes 1 measurement every second, and after 32–1000 seconds, the time series must be classified into one of two classes. Some pre-classified training data is provided. It is not necessary to classify all the test data, but you do need to have relatively high accuracy on what is classified. So you could find a subset of more easily classifiable test time series, and leave the rest of the test data unclassified. Continue reading →

# Am I a data scientist?

Last night I gave a very short talk (less than 5 minutes) at the Melbourne Analytics Charity Christmas Gala, a combined event of the Statistical Society of Australia, Data Science Melbourne, Big Data Analytics and Melbourne Users of R Network.

This is (roughly) what I said. Continue reading →

# Honoring Herman Stekler

The first issue of the *IJF* for 2015 has just been published, and I’m delighted that it includes a special section honoring Herman Stekler. It includes articles covering a range of his forecasting interests, although not all of them (sports forecasting is missing). Herman himself wrote a paper for it looking at “Forecasting—Yesterday, Today and Tomorrow”.

He is in a unique position to write such a paper as he has been doing forecasting research longer than anyone else on the planet — his first published paper on forecasting appeared in 1959. Herman is now 82 years old, and is still very active in research. Only a couple of months ago, he wrote to me with some new research ideas he had been thinking about, asking me for some feedback. He is also an extraordinarily conscientious and careful associate editor of the *IJF* and a delight to work with. He is truly “a scholar and a gentleman” and I am very happy that we can honor Herman in this manner. Thanks to Tara Sinclair, Prakash Loungani and Fred Joutz for putting this tribute together.

We also published an interview with Herman in the *IJF* in 2010 which contains some information about his early years, graduate education and first academic jobs.

# Prediction competitions

Competitions have a long history in forecasting and prediction, and have been instrumental in forcing research attention on methods that work well in practice. In the forecasting community, the M competition and M3 competition have been particularly influential. The data mining community have the annual KDD cup which has generated attention on a wide range of prediction problems and associated methods. Recent KDD cups are hosted on kaggle.

In my research group meeting today, we discussed our (limited) experiences in competing in some Kaggle competitions, and we reviewed the following two papers which describe two prediction competitions:

# New Australian data on the HMD

The Human Mortality Database is a wonderful resource for anyone interested in demographic data. It is a carefully curated collection of high quality deaths and population data from 37 countries, all in a consistent format with consistent definitions. I have used it many times and never cease to be amazed at the care taken to maintain such a great resource.

The data are continually being revised and updated. Today the Australian data has been updated to 2011. There is a time lag because of lagged death registrations which results in undercounts; so only data that are likely to be complete are included.

Tim Riffe from the HMD has provided the following information about the update:

- All death counts since 1964 are now included by year of occurrence, up to 2011. We have 2012 data but do not publish them because they are likely a 5% undercount due to lagged registration.
- Death count inputs for 1921 to 1963 are now in single ages. Previously they were in 5-year age groups. Rather than having an open age group of 85+ in this period counts usually go up to the maximum observed (stated) age. This change (i) introduces minor heaping in early years and (ii) implies different apparent old-age mortality than before, since previously anything above 85 was modeled according to the Methods Protocol.
- Population denominators have been swapped out for years 1992 to the present, owing to new ABS methodology and intercensal estimates for the recent period.

Some of the data can be read into R using the `hmd.mx`

and `hmd.e0`

functions from the demography package. Tim has his own package on github that provides a more extensive interface.

# Visualization of probabilistic forecasts

This week my research group discussed Adrian Raftery’s recent paper on “Use and Communication of Probabilistic Forecasts” which provides a fascinating but brief survey of some of his work on modelling and communicating uncertain futures. Coincidentally, today I was also sent a copy of David Spiegelhalter’s paper on “Visualizing Uncertainty About the Future”. Both are well-worth reading.

It made me think about my own efforts to communicate future uncertainty through graphics. Of course, for time series forecasts I normally show prediction intervals. I prefer to use more than one interval at a time because it helps convey a little more information. The default in the forecast package for R is to show both an 80% and a 95% interval like this: Continue reading →