Thinking big at Yahoo

I’m speak­ing in the “Yahoo Labs Big Thinkers” series on Fri­day 26 June. I hope I can live up to the title!

My talk is on “Explor­ing the bound­aries of pre­dictabil­ity: what can we fore­cast, and when should we give up?”  Essen­tially I will start with some of the ideas in this post, and then dis­cuss the fea­tures of hard-​​to-​​forecast time series.

So if you’re in the San Fran­cisco Bay area, please come along. Oth­er­wise, it will be streamed live on the Yahoo Labs web­site. Con­tinue reading →

Nominations for best International Journal of Forecasting paper, 2012–2013

Every two years, the Inter­na­tional Jour­nal of Fore­cast­ing awards a prize for the best paper pub­lished in a two year period. It is now time to iden­tify the best paper pub­lished in the IJF dur­ing 2012 and 2013. There is always about 18 months delay after the pub­li­ca­tion period to allow time for reflec­tion, cita­tions, etc. The prize is US$1000 plus an engraved plaque. Con­tinue reading →

Statistical modelling and analysis of big data

I’m cur­rently attend­ing the one day work­shop on this topic at QUT in Bris­bane. This morn­ing I spoke on “Visu­al­iz­ing and fore­cast­ing big time series data”. My slides are here.

The talks are being streamed.

OVERVIEW

Big data is now endemic in busi­ness, indus­try, gov­ern­ment, envi­ron­men­tal man­age­ment, med­ical sci­ence, social research and so on. One of the com­men­su­rate chal­lenges is how to effec­tively model and analyse these data.

This work­shop will bring together national and inter­na­tional experts in sta­tis­ti­cal mod­el­ling and analy­sis of big data, to share their expe­ri­ences, approaches and opin­ions about future direc­tions in this field.

New R package for electricity forecasting

Shu Fan and I have devel­oped a model for elec­tric­ity demand fore­cast­ing that is now widely used in Aus­tralia for long-​​term fore­cast­ing of peak elec­tric­ity demand. It has become known as the “Monash Elec­tric­ity Fore­cast­ing Model”. We have decided to release an R pack­age that imple­ments our model so that other peo­ple can eas­ily use it. The pack­age is called “MEFM” and is avail­able on github. We will prob­a­bly also put in on CRAN eventually.

The model was first described in  Hyn­d­man and Fan (2010). We are con­tin­u­ally improv­ing it, and the lat­est ver­sion is decribed in the model doc­u­men­ta­tion which will be updated from time to time.

The pack­age is being released under a GPL licence, so any­one can use it. All we ask is that our work is prop­erly cited.

Nat­u­rally, we are not able to pro­vide free tech­ni­cal sup­port, although we wel­come bug reports. We are avail­able to under­take paid con­sult­ing work in elec­tric­ity forecasting.

 

A time series classification contest

Amongst today’s email was one from some­one run­ning a pri­vate com­pe­ti­tion to clas­sify time series. Here are the essen­tial details.

The data are mea­sure­ments from a med­ical diag­nos­tic machine which takes 1 mea­sure­ment every sec­ond, and after 32–1000 sec­onds, the time series must be clas­si­fied into one of two classes. Some pre-​​classified train­ing data is pro­vided. It is not nec­es­sary to clas­sify all the test data, but you do need to have rel­a­tively high accu­racy on what is clas­si­fied. So you could find a sub­set of more eas­ily clas­si­fi­able test time series, and leave the rest of the test data unclas­si­fied. Con­tinue reading →