Reproducibility in computational research

Jane Fra­zier spoke at our research team meet­ing today on “Repro­ducibil­ity in com­pu­ta­tional research”. We had a very stim­u­lat­ing and lively dis­cus­sion about the issues involved. One inter­est­ing idea was that repro­ducibil­ity is on a scale, and we can all aim to move fur­ther along the scale towards mak­ing our own research more repro­ducible. For example

  • Can you repro­duce your results tomor­row on the same com­puter with the same soft­ware installed?
  • Could some­one else on a dif­fer­ent com­puter repro­duce your results with the same soft­ware installed?
  • Could you repro­duce your results in 3 years time after some of your soft­ware envi­ron­ment may have changed?
  • etc.

Think about what changes you need to make to move one step fur­ther along the repro­ducibil­ity con­tin­uüm, and do it.

Jane’s slides and hand­out are below. Con­tinue reading →

Upcoming talks in California

I’m back in Cal­i­for­nia for the next cou­ple of weeks, and will give the fol­low­ing talk at Stan­ford and UC-​​Davis.

Optimal forecast reconciliation for big time series data

Time series can often be nat­u­rally dis­ag­gre­gated in a hier­ar­chi­cal or grouped struc­ture. For exam­ple, a man­u­fac­tur­ing com­pany can dis­ag­gre­gate total demand for their prod­ucts by coun­try of sale, retail out­let, prod­uct type, pack­age size, and so on. As a result, there can be mil­lions of indi­vid­ual time series to fore­cast at the most dis­ag­gre­gated level, plus addi­tional series to fore­cast at higher lev­els of aggregation.

A com­mon con­straint is that the dis­ag­gre­gated fore­casts need to add up to the fore­casts of the aggre­gated data. This is known as fore­cast rec­on­cil­i­a­tion. I will show that the opti­mal rec­on­cil­i­a­tion method involves fit­ting an ill-​​conditioned lin­ear regres­sion model where the design matrix has one col­umn for each of the series at the most dis­ag­gre­gated level. For prob­lems involv­ing huge num­bers of series, the model is impos­si­ble to esti­mate using stan­dard regres­sion algo­rithms. I will also dis­cuss some fast algo­rithms for imple­ment­ing this model that make it prac­ti­ca­ble for imple­ment­ing in busi­ness contexts.

Stan­ford: 4.30pm, Tues­day 6th Octo­ber.
UCDavis: 4:10pm, Thurs­day 8th October.

The bias-variance decomposition

This week, I am teach­ing my Busi­ness Ana­lyt­ics class about the bias-​​variance trade-​​off. For some rea­son, the proof is not con­tained in either ESL or ISL, even though it is quite sim­ple. I also dis­cov­ered that the proof cur­rently pro­vided on Wikipedia makes lit­tle sense in places.

So I wrote my own for the class. It is longer than nec­es­sary to ensure there are no jumps that might con­fuse stu­dents.
Con­tinue reading →

Murphy diagrams in R

At the recent Inter­na­tional Sym­po­sium on Fore­cast­ing, held in River­side, Cal­i­for­nia, Till­man Gneit­ing gave a great talk on “Eval­u­at­ing fore­casts: why proper scor­ing rules and con­sis­tent scor­ing func­tions mat­ter”. It will be the sub­ject of an IJF invited paper in due course.

One of the things he talked about was the “Mur­phy dia­gram” for com­par­ing fore­casts, as pro­posed in Ehm et al (2015). Here’s how it works for com­par­ing mean fore­casts. Con­tinue reading →

North American seminars: June 2015

For the next few weeks I am trav­el­ling in North Amer­ica and will be giv­ing the fol­low­ing talks.

The Yahoo talk will be streamed live.

I’ll post slides on my main site after each talk.

Statistical modelling and analysis of big data

I’m cur­rently attend­ing the one day work­shop on this topic at QUT in Bris­bane. This morn­ing I spoke on “Visu­al­iz­ing and fore­cast­ing big time series data”. My slides are here.

The talks are being streamed.


Big data is now endemic in busi­ness, indus­try, gov­ern­ment, envi­ron­men­tal man­age­ment, med­ical sci­ence, social research and so on. One of the com­men­su­rate chal­lenges is how to effec­tively model and analyse these data.

This work­shop will bring together national and inter­na­tional experts in sta­tis­ti­cal mod­el­ling and analy­sis of big data, to share their expe­ri­ences, approaches and opin­ions about future direc­tions in this field.