Stanford seminar

I gave a sem­i­nar at Stan­ford today. Slides are below. It was def­i­nitely the most intim­i­dat­ing audi­ence I’ve faced, with Jerome Fried­man, Trevor Hastie, Brad Efron, Persi Dia­co­nis, Susan Holmes, David Donoho and John Cham­bers all present (and prob­a­bly other famous names I’ve missed).

I’ll be giv­ing essen­tially the same talk at UC Davis on Thurs­day. Con­tinue reading →

Upcoming talks in California

I’m back in Cal­i­for­nia for the next cou­ple of weeks, and will give the fol­low­ing talk at Stan­ford and UC-​​Davis.

Optimal forecast reconciliation for big time series data

Time series can often be nat­u­rally dis­ag­gre­gated in a hier­ar­chi­cal or grouped struc­ture. For exam­ple, a man­u­fac­tur­ing com­pany can dis­ag­gre­gate total demand for their prod­ucts by coun­try of sale, retail out­let, prod­uct type, pack­age size, and so on. As a result, there can be mil­lions of indi­vid­ual time series to fore­cast at the most dis­ag­gre­gated level, plus addi­tional series to fore­cast at higher lev­els of aggregation.

A com­mon con­straint is that the dis­ag­gre­gated fore­casts need to add up to the fore­casts of the aggre­gated data. This is known as fore­cast rec­on­cil­i­a­tion. I will show that the opti­mal rec­on­cil­i­a­tion method involves fit­ting an ill-​​conditioned lin­ear regres­sion model where the design matrix has one col­umn for each of the series at the most dis­ag­gre­gated level. For prob­lems involv­ing huge num­bers of series, the model is impos­si­ble to esti­mate using stan­dard regres­sion algo­rithms. I will also dis­cuss some fast algo­rithms for imple­ment­ing this model that make it prac­ti­ca­ble for imple­ment­ing in busi­ness contexts.

Stan­ford: 4.30pm, Tues­day 6th Octo­ber.
UCDavis: 4:10pm, Thurs­day 8th October.

Data Science for Managers (short course)

I am teach­ing part of a short-​​course on Data Sci­ence for Man­agers from 10–12 Octo­ber in Melbourne.

Course Overview

The impact of Data Sci­ence on mod­ern busi­ness is sec­ond only to the intro­duc­tion of com­put­ers. And yet, for many busi­nesses the bar­rier of entry remains too high due to lack of knowhow, organ­i­sa­tional iner­tia, dif­fi­cul­ties in hir­ing the right man­power, an appar­ent need for upfront com­mit­ment, and more.

This course is designed to address these bar­ri­ers, giv­ing the nec­es­sary knowl­edge and skills to flesh out and man­age Data Sci­ence func­tions within your organ­i­sa­tion, tak­ing the anxiety-​​factor out of the Big Data rev­o­lu­tion and demon­strat­ing how data-​​driven decision-​​making can be inte­grated into one’s organ­i­sa­tion to har­ness exist­ing advan­tages and to cre­ate new opportunities.

Assum­ing min­i­mal prior knowl­edge, this course pro­vides com­plete cov­er­age of the key aspects, includ­ing data wran­gling, mod­el­ling and analy­sis, predictive-​​, descrip­tive– and prescriptive-​​analytics, data man­age­ment and cura­tion, stan­dards for data stor­age and analy­sis, the use of struc­tured, semi-​​structured and unstruc­tured data as well as of open pub­lic data, and the data-​​analytic value chain, all cov­ered at a fun­da­men­tal level.

More details avail­able at it​.monash​.edu/​d​a​t​a​-​s​c​ience.

Early-​​bird book­ings close in a few days.


Keeping up to date with my research papers

Many peo­ple ask me to let them know when I write a new research paper. I can’t do that as there are too many peo­ple involved, and it is not scalable.

The solu­tion is sim­ple. Take your pick from the fol­low­ing options. Each is auto­matic and will let you know when­ever I pro­duce a new paper.

  1. Sub­scribe to the rss feed on my web­site using feedly or some other rss reader.
  2. Sub­scribe to new papers via email from feedburner.
  3. Go to my Google scholar page and click “Fol­low” at the top of the page.

The lat­ter method will work for any­one with a Google scholar page. The Google scholar option only includes research papers. The first two meth­ods also include any new sem­i­nars I give or new soft­ware pack­ages I write.

North American seminars: June 2015

For the next few weeks I am trav­el­ling in North Amer­ica and will be giv­ing the fol­low­ing talks.

The Yahoo talk will be streamed live.

I’ll post slides on my main site after each talk.

Thinking big at Yahoo

I’m speak­ing in the “Yahoo Labs Big Thinkers” series on Fri­day 26 June. I hope I can live up to the title!

My talk is on “Explor­ing the bound­aries of pre­dictabil­ity: what can we fore­cast, and when should we give up?”  Essen­tially I will start with some of the ideas in this post, and then dis­cuss the fea­tures of hard-​​to-​​forecast time series.

So if you’re in the San Fran­cisco Bay area, please come along. Oth­er­wise, it will be streamed live on the Yahoo Labs web­site. Con­tinue reading →

Statistical modelling and analysis of big data

I’m cur­rently attend­ing the one day work­shop on this topic at QUT in Bris­bane. This morn­ing I spoke on “Visu­al­iz­ing and fore­cast­ing big time series data”. My slides are here.

The talks are being streamed.


Big data is now endemic in busi­ness, indus­try, gov­ern­ment, envi­ron­men­tal man­age­ment, med­ical sci­ence, social research and so on. One of the com­men­su­rate chal­lenges is how to effec­tively model and analyse these data.

This work­shop will bring together national and inter­na­tional experts in sta­tis­ti­cal mod­el­ling and analy­sis of big data, to share their expe­ri­ences, approaches and opin­ions about future direc­tions in this field.