Stanford seminar

I gave a sem­i­nar at Stan­ford today. Slides are below. It was def­i­nitely the most intim­i­dat­ing audi­ence I’ve faced, with Jerome Fried­man, Trevor Hastie, Brad Efron, Persi Dia­co­nis, Susan Holmes, David Donoho and John Cham­bers all present (and prob­a­bly other famous names I’ve missed).

I’ll be giv­ing essen­tially the same talk at UC Davis on Thurs­day. Con­tinue reading →

Reproducibility in computational research

Jane Fra­zier spoke at our research team meet­ing today on “Repro­ducibil­ity in com­pu­ta­tional research”. We had a very stim­u­lat­ing and lively dis­cus­sion about the issues involved. One inter­est­ing idea was that repro­ducibil­ity is on a scale, and we can all aim to move fur­ther along the scale towards mak­ing our own research more repro­ducible. For example

  • Can you repro­duce your results tomor­row on the same com­puter with the same soft­ware installed?
  • Could some­one else on a dif­fer­ent com­puter repro­duce your results with the same soft­ware installed?
  • Could you repro­duce your results in 3 years time after some of your soft­ware envi­ron­ment may have changed?
  • etc.

Think about what changes you need to make to move one step fur­ther along the repro­ducibil­ity con­tin­uüm, and do it.

Jane’s slides and hand­out are below. Con­tinue reading →

Upcoming talks in California

I’m back in Cal­i­for­nia for the next cou­ple of weeks, and will give the fol­low­ing talk at Stan­ford and UC-​​Davis.

Optimal forecast reconciliation for big time series data

Time series can often be nat­u­rally dis­ag­gre­gated in a hier­ar­chi­cal or grouped struc­ture. For exam­ple, a man­u­fac­tur­ing com­pany can dis­ag­gre­gate total demand for their prod­ucts by coun­try of sale, retail out­let, prod­uct type, pack­age size, and so on. As a result, there can be mil­lions of indi­vid­ual time series to fore­cast at the most dis­ag­gre­gated level, plus addi­tional series to fore­cast at higher lev­els of aggregation.

A com­mon con­straint is that the dis­ag­gre­gated fore­casts need to add up to the fore­casts of the aggre­gated data. This is known as fore­cast rec­on­cil­i­a­tion. I will show that the opti­mal rec­on­cil­i­a­tion method involves fit­ting an ill-​​conditioned lin­ear regres­sion model where the design matrix has one col­umn for each of the series at the most dis­ag­gre­gated level. For prob­lems involv­ing huge num­bers of series, the model is impos­si­ble to esti­mate using stan­dard regres­sion algo­rithms. I will also dis­cuss some fast algo­rithms for imple­ment­ing this model that make it prac­ti­ca­ble for imple­ment­ing in busi­ness contexts.

Stan­ford: 4.30pm, Tues­day 6th Octo­ber.
UCDavis: 4:10pm, Thurs­day 8th October.

Mathematical annotations on R plots

I’ve always strug­gled with using plotmath via the expression func­tion in R for adding math­e­mat­i­cal nota­tion to axes or leg­ends. For some rea­son, the most obvi­ous way to write some­thing never seems to work for me and I end up using trial and error in a loop with far too many iterations.

So I am very happy to see the new latex2exp pack­age avail­able which trans­lates LaTeX expres­sions into a form suit­able for R graphs. This is going to save me time and frus­tra­tion! Con­tinue reading →

Murphy diagrams in R

At the recent Inter­na­tional Sym­po­sium on Fore­cast­ing, held in River­side, Cal­i­for­nia, Till­man Gneit­ing gave a great talk on “Eval­u­at­ing fore­casts: why proper scor­ing rules and con­sis­tent scor­ing func­tions mat­ter”. It will be the sub­ject of an IJF invited paper in due course.

One of the things he talked about was the “Mur­phy dia­gram” for com­par­ing fore­casts, as pro­posed in Ehm et al (2015). Here’s how it works for com­par­ing mean fore­casts. Con­tinue reading →

Useful tutorials

There are some tools that I use reg­u­larly, and I would like my research stu­dents and post-​​docs to learn them too. Here are some great online tuto­ri­als that might help.

North American seminars: June 2015

For the next few weeks I am trav­el­ling in North Amer­ica and will be giv­ing the fol­low­ing talks.

The Yahoo talk will be streamed live.

I’ll post slides on my main site after each talk.

R vs Autobox vs ForecastPro vs ...

Every now and then a com­mer­cial soft­ware ven­dor makes claims on social media about how their soft­ware is so much bet­ter than the fore­cast pack­age for R, but no details are provided.

There are lots of rea­sons why you might select a par­tic­u­lar soft­ware solu­tion, and R isn’t for every­one. But any­one claim­ing supe­ri­or­ity should at least pro­vide some evi­dence rather than make unsub­stan­ti­ated claims. Con­tinue reading →