I gave a seminar at Stanford today. Slides are below. It was definitely the most intimidating audience I’ve faced, with Jerome Friedman, Trevor Hastie, Brad Efron, Persi Diaconis, Susan Holmes, David Donoho and John Chambers all present (and probably other famous names I’ve missed).
Jane Frazier spoke at our research team meeting today on “Reproducibility in computational research”. We had a very stimulating and lively discussion about the issues involved. One interesting idea was that reproducibility is on a scale, and we can all aim to move further along the scale towards making our own research more reproducible. For example
- Can you reproduce your results tomorrow on the same computer with the same software installed?
- Could someone else on a different computer reproduce your results with the same software installed?
- Could you reproduce your results in 3 years time after some of your software environment may have changed?
Think about what changes you need to make to move one step further along the reproducibility continuüm, and do it.
Jane’s slides and handout are below. Continue reading →
I will be speaking at the Chinese R conference in Nanchang, to be held on 24–25 October, on “Forecasting Big Time Series Data using R”.
Details (for those who can read Chinese) are at china-r.org.
I’m back in California for the next couple of weeks, and will give the following talk at Stanford and UC-Davis.
Optimal forecast reconciliation for big time series data
Time series can often be naturally disaggregated in a hierarchical or grouped structure. For example, a manufacturing company can disaggregate total demand for their products by country of sale, retail outlet, product type, package size, and so on. As a result, there can be millions of individual time series to forecast at the most disaggregated level, plus additional series to forecast at higher levels of aggregation.
A common constraint is that the disaggregated forecasts need to add up to the forecasts of the aggregated data. This is known as forecast reconciliation. I will show that the optimal reconciliation method involves fitting an ill-conditioned linear regression model where the design matrix has one column for each of the series at the most disaggregated level. For problems involving huge numbers of series, the model is impossible to estimate using standard regression algorithms. I will also discuss some fast algorithms for implementing this model that make it practicable for implementing in business contexts.
I’ve always struggled with using
plotmath via the
expression function in R for adding mathematical notation to axes or legends. For some reason, the most obvious way to write something never seems to work for me and I end up using trial and error in a loop with far too many iterations.
So I am very happy to see the new latex2exp package available which translates LaTeX expressions into a form suitable for R graphs. This is going to save me time and frustration! Continue reading →
At the recent International Symposium on Forecasting, held in Riverside, California, Tillman Gneiting gave a great talk on “Evaluating forecasts: why proper scoring rules and consistent scoring functions matter”. It will be the subject of an IJF invited paper in due course.
There are some tools that I use regularly, and I would like my research students and post-docs to learn them too. Here are some great online tutorials that might help.
Last week I gave a talk in the Yahoo! Big Thinkers series. The video of the talk is now online and embedded below.
For the next few weeks I am travelling in North America and will be giving the following talks.
- 19 June: Southern California Edison, Rosemead CA.
“Probabilistic forecasting of peak electricity demand”.
- 23 June: International Symposium on Forecasting, Riverside CA.
“MEFM: An R package for long-term probabilistic forecasting of electricity demand”.
- 25 June: Google, Mountain View, CA.
“Automatic algorithms for time series forecasting”.
- 26 June: Yahoo, Sunnyvale, CA.
“Exploring the boundaries of predictability: what can we forecast, and when should we give up?”
- 30 June: Workshop on Frontiers in Functional Data Analysis, Banff, Canada.
“Exploring the feature space of large collections of time series”.
The Yahoo talk will be streamed live.
I’ll post slides on my main site after each talk.
Every now and then a commercial software vendor makes claims on social media about how their software is so much better than the forecast package for R, but no details are provided.
There are lots of reasons why you might select a particular software solution, and R isn’t for everyone. But anyone claiming superiority should at least provide some evidence rather than make unsubstantiated claims. Continue reading →