Amongst today’s email was one from someone running a private competition to classify time series. Here are the essential details.
The data are measurements from a medical diagnostic machine which takes 1 measurement every second, and after 32–1000 seconds, the time series must be classified into one of two classes. Some pre-classified training data is provided. It is not necessary to classify all the test data, but you do need to have relatively high accuracy on what is classified. So you could find a subset of more easily classifiable test time series, and leave the rest of the test data unclassified. Continue reading →
Competitions have a long history in forecasting and prediction, and have been instrumental in forcing research attention on methods that work well in practice. In the forecasting community, the M competition and M3 competition have been particularly influential. The data mining community have the annual KDD cup which has generated attention on a wide range of prediction problems and associated methods. Recent KDD cups are hosted on kaggle.
In my research group meeting today, we discussed our (limited) experiences in competing in some Kaggle competitions, and we reviewed the following two papers which describe two prediction competitions:
Four tracks: electric load, electricity price, wind power and solar power forecasting.
Probabilistic forecasting: contestants are required to submit 99 quantiles for each step throughout the forecast horizon.
Rolling forecasting: incremental data sets are being released on weekly basis to forecast the next period of interest.
Prizes for winning teams and institutions: up to 3 teams from each track will be recognized as the winning team; top institutions with multiple well-performing teams will be recognized as the winning institutions.
Global participation: 200+ people from 40+ countries have already signed up the GEFCom2014 interest list.
We have an exciting new initiative at Monash University with some new positions in business analytics. This is part of a plan to strengthen our research and teaching in the data science/computational statistics area. We are hoping to make multiple appointments, at junior and senior levels. These are five-year appointments, but we hope that the positions will continue after that if we can secure suitable funding. Continue reading →
Forecasting competitions are a great way to test new methods and obtain a realistic evaluation of how good they are. So I’m delighted that the IEEE is organizing an energy forecasting competition as outlined by Tao Hong below. Continue reading →
Forecasting Ace is looking for participants to develop improved methods for predicting future events and outcomes. Their goal is to develop methods for aggregating many individual judgments in a manner that yields more accurate predictions than any one person or small group alone could provide. Potential applications of the system include forecasting economic conditions, political changes, technological development and medical breakthroughs. Continue reading →
For data from a single industry, using a global trend (i.e., estimated across all series) can be useful.
Combining forecasts is a good idea. (This lesson seems to be re-learned in every forecasting competition!)
The MASE can be very sensitive to a few series, and to optimize MASE it is worth concentrating on these. (This is actually not a good message for forecasting overall, as we want good forecasts for all series. Maybe we need to find a metric with similar properties to MASE but with a less skewed distribution.)
Outlier removal before forecasting can be effective. (This is an interesting result as outlier removal algorithms used in the M3 competition did not help forecast accuracy.)
Jeremy and Lee receive $500 for their efforts and they have decided to donate their prize money to the Fred Hollows Foundation. $500 will restore vision to 20 people. They will also write up their methods in more detail for the International Journal of Forecasting. I am hopeful that Philip Brierley of team Sali Mali (who did very well in the second stage of the competition) will also write a short explanation of his methods for the IJF.
Thanks to everyone who participated in the competition. Thanks also to Anthony Goldbloom from Kaggle for hosting the competition. Kaggle is a wonderful platform for prediction competitions and I hope it will be used for many more competitions of this type in the future.