A time series classification contest

Amongst today’s email was one from some­one run­ning a pri­vate com­pe­ti­tion to clas­sify time series. Here are the essen­tial details.

The data are mea­sure­ments from a med­ical diag­nos­tic machine which takes 1 mea­sure­ment every sec­ond, and after 32–1000 sec­onds, the time series must be clas­si­fied into one of two classes. Some pre-​​classified train­ing data is pro­vided. It is not nec­es­sary to clas­sify all the test data, but you do need to have rel­a­tively high accu­racy on what is clas­si­fied. So you could find a sub­set of more eas­ily clas­si­fi­able test time series, and leave the rest of the test data unclas­si­fied. Con­tinue reading →

Prediction competitions

Com­pe­ti­tions have a long his­tory in fore­cast­ing and pre­dic­tion, and have been instru­men­tal in forc­ing research atten­tion on meth­ods that work well in prac­tice. In the fore­cast­ing com­mu­nity, the M com­pe­ti­tion and M3 com­pe­ti­tion have been par­tic­u­larly influ­en­tial. The data min­ing com­mu­nity have the annual KDD cup which has gen­er­ated atten­tion on a wide range of pre­dic­tion prob­lems and asso­ci­ated meth­ods. Recent KDD cups are hosted on kag­gle.

In my research group meet­ing today, we dis­cussed our (lim­ited) expe­ri­ences in com­pet­ing in some Kag­gle com­pe­ti­tions, and we reviewed the fol­low­ing two papers which describe two pre­dic­tion competitions:

  1. Athana­sopou­los and Hyn­d­man (IJF 2011). The value of feed­back in fore­cast­ing com­pe­ti­tions. [preprint ver­sion]
  2. Roy et al (2013). The Microsoft Aca­d­e­mic Search Dataset and KDD Cup 2013.

Con­tinue reading →

GEFCom 2014 energy forecasting competition is underway

GEF­Com 2014 is the most advanced energy fore­cast­ing com­pe­ti­tion ever orga­nized, both in terms of the data involved, and in terms of the way the fore­casts will be evaluated.

So every­one inter­ested in energy fore­cast­ing should head over to the com­pe­ti­tion web­page and start fore­cast­ing: www​.gef​com​.org.

This time, the com­pe­ti­tion is hosted on Crow­d­AN­A­LYTIX rather than Kag­gle.

High­lights of GEFCom2014:

  • An upgraded edi­tion from GEFCom2012
  • Four tracks: elec­tric load, elec­tric­ity price, wind power and solar power forecasting.
  • Prob­a­bilis­tic fore­cast­ing: con­tes­tants are required to sub­mit 99 quan­tiles for each step through­out the fore­cast horizon.
  • Rolling fore­cast­ing: incre­men­tal data sets are being released on weekly basis to fore­cast the next period of interest.
  • Prizes for win­ning teams and insti­tu­tions: up to 3 teams from each track will be rec­og­nized as the win­ning team; top insti­tu­tions with mul­ti­ple well-​​performing teams will be rec­og­nized as the win­ning institutions.
  • Global par­tic­i­pa­tion: 200+ peo­ple from 40+ coun­tries have already signed up the GEFCom2014 inter­est list.

Tao Hong (the main orga­nizer) has a few tips on his blog that you should read before starting.

 

New jobs in business analytics at Monash

We have an excit­ing new ini­tia­tive at Monash Uni­ver­sity with some new posi­tions in busi­ness ana­lyt­ics. This is part of a plan to strengthen our research and teach­ing in the data science/​computational sta­tis­tics area. We are hop­ing to make mul­ti­ple appoint­ments, at junior and senior lev­els. These are five-​​year appoint­ments, but we hope that the posi­tions will con­tinue after that if we can secure suit­able fund­ing. Con­tinue reading →

Crowd sourcing forecasts

Fore­cast­ing Ace is look­ing for par­tic­i­pants to develop improved meth­ods for pre­dict­ing future events and out­comes. Their goal is to develop meth­ods for aggre­gat­ing many indi­vid­ual judg­ments in a man­ner that yields more accu­rate pre­dic­tions than any one per­son or small group alone could pro­vide. Poten­tial appli­ca­tions of the sys­tem include fore­cast­ing eco­nomic con­di­tions, polit­i­cal changes, tech­no­log­i­cal devel­op­ment and med­ical break­throughs.
Con­tinue reading →

Tourism forecasting competition ends

And the win­ners are … Jeremy Howard and Lee C Baker. (See my ear­lier post for infor­ma­tion about the competition.)

Jeremy describes his approach to sea­sonal time series in a blog post on Kag​gle​.com. Lee described his approach to annual time series in an ear­lier post.

A few lessons that come out of this:

  • For data from a sin­gle indus­try, using a global trend (i.e., esti­mated across all series) can be useful.
  • Com­bin­ing fore­casts is a good idea. (This les­son seems to be re-​​learned in every fore­cast­ing competition!)
  • The MASE can be very sen­si­tive to a few series, and to opti­mize MASE it is worth con­cen­trat­ing on these. (This is actu­ally not a good mes­sage for fore­cast­ing over­all, as we want good fore­casts for all series. Maybe we need to find a met­ric with sim­i­lar prop­er­ties to MASE but with a less skewed distribution.)
  • Out­lier removal before fore­cast­ing can be effec­tive. (This is an inter­est­ing result as out­lier removal algo­rithms used in the M3 com­pe­ti­tion did not help fore­cast accuracy.)

Jeremy and Lee receive $500 for their efforts and they have decided to donate their prize money to the Fred Hol­lows Foun­da­tion. $500 will restore vision to 20 peo­ple. They will also write up their meth­ods in more detail for the Inter­na­tional Jour­nal of Fore­cast­ing. I am hope­ful that Philip Brier­ley of team Sali Mali (who did very well in the sec­ond stage of the com­pe­ti­tion) will also write a short expla­na­tion of his meth­ods for the IJF.

Thanks to every­one who par­tic­i­pated in the com­pe­ti­tion. Thanks also to Anthony Gold­bloom from Kag­gle for host­ing the com­pe­ti­tion. Kag­gle is a won­der­ful plat­form for pre­dic­tion com­pe­ti­tions and I hope it will be used for many more com­pe­ti­tions of this type in the future.