I gave a seminar at Stanford today. Slides are below. It was definitely the most intimidating audience I’ve faced, with Jerome Friedman, Trevor Hastie, Brad Efron, Persi Diaconis, Susan Holmes, David Donoho and John Chambers all present (and probably other famous names I’ve missed).
I will be speaking at the Chinese R conference in Nanchang, to be held on 24–25 October, on “Forecasting Big Time Series Data using R”.
Details (for those who can read Chinese) are at china-r.org.
I’m back in California for the next couple of weeks, and will give the following talk at Stanford and UC-Davis.
Optimal forecast reconciliation for big time series data
Time series can often be naturally disaggregated in a hierarchical or grouped structure. For example, a manufacturing company can disaggregate total demand for their products by country of sale, retail outlet, product type, package size, and so on. As a result, there can be millions of individual time series to forecast at the most disaggregated level, plus additional series to forecast at higher levels of aggregation.
A common constraint is that the disaggregated forecasts need to add up to the forecasts of the aggregated data. This is known as forecast reconciliation. I will show that the optimal reconciliation method involves fitting an ill-conditioned linear regression model where the design matrix has one column for each of the series at the most disaggregated level. For problems involving huge numbers of series, the model is impossible to estimate using standard regression algorithms. I will also discuss some fast algorithms for implementing this model that make it practicable for implementing in business contexts.
I am teaching part of a short-course on Data Science for Managers from 10–12 October in Melbourne.
The impact of Data Science on modern business is second only to the introduction of computers. And yet, for many businesses the barrier of entry remains too high due to lack of knowhow, organisational inertia, difficulties in hiring the right manpower, an apparent need for upfront commitment, and more.
This course is designed to address these barriers, giving the necessary knowledge and skills to flesh out and manage Data Science functions within your organisation, taking the anxiety-factor out of the Big Data revolution and demonstrating how data-driven decision-making can be integrated into one’s organisation to harness existing advantages and to create new opportunities.
Assuming minimal prior knowledge, this course provides complete coverage of the key aspects, including data wrangling, modelling and analysis, predictive-, descriptive– and prescriptive-analytics, data management and curation, standards for data storage and analysis, the use of structured, semi-structured and unstructured data as well as of open public data, and the data-analytic value chain, all covered at a fundamental level.
More details available at it.monash.edu/data-science.
Early-bird bookings close in a few days.
Last week I gave a talk in the Yahoo! Big Thinkers series. The video of the talk is now online and embedded below.
Many people ask me to let them know when I write a new research paper. I can’t do that as there are too many people involved, and it is not scalable.
The solution is simple. Take your pick from the following options. Each is automatic and will let you know whenever I produce a new paper.
- Subscribe to the rss feed on my website using feedly or some other rss reader.
- Subscribe to new papers via email from feedburner.
- Go to my Google scholar page and click “Follow” at the top of the page.
The latter method will work for anyone with a Google scholar page. The Google scholar option only includes research papers. The first two methods also include any new seminars I give or new software packages I write.
For the next few weeks I am travelling in North America and will be giving the following talks.
- 19 June: Southern California Edison, Rosemead CA.
“Probabilistic forecasting of peak electricity demand”.
- 23 June: International Symposium on Forecasting, Riverside CA.
“MEFM: An R package for long-term probabilistic forecasting of electricity demand”.
- 25 June: Google, Mountain View, CA.
“Automatic algorithms for time series forecasting”.
- 26 June: Yahoo, Sunnyvale, CA.
“Exploring the boundaries of predictability: what can we forecast, and when should we give up?”
- 30 June: Workshop on Frontiers in Functional Data Analysis, Banff, Canada.
“Exploring the feature space of large collections of time series”.
The Yahoo talk will be streamed live.
I’ll post slides on my main site after each talk.
I’m speaking in the “Yahoo Labs Big Thinkers” series on Friday 26 June. I hope I can live up to the title!
My talk is on “Exploring the boundaries of predictability: what can we forecast, and when should we give up?” Essentially I will start with some of the ideas in this post, and then discuss the features of hard-to-forecast time series.
So if you’re in the San Francisco Bay area, please come along. Otherwise, it will be streamed live on the Yahoo Labs website. Continue reading →
Big data is now endemic in business, industry, government, environmental management, medical science, social research and so on. One of the commensurate challenges is how to effectively model and analyse these data.
This workshop will bring together national and international experts in statistical modelling and analysis of big data, to share their experiences, approaches and opinions about future directions in this field.
I’m currently visiting Taiwan and I’m giving two seminars while I’m here — one at the National Tsing Hua University in Hsinchu, and the other at Academia Sinica in Taipei. Details are below for those who might be nearby. Continue reading →