In just over three weeks, the inaugural MeDaScIn event will take place. This is an initiative to grow the talent pool of local data scientists and to promote Melbourne as a world city of excellence in Data Science.
The main event takes place on Friday 6th May, with lots of interesting sounding titles and speakers from business and government. I’m the only academic speaker on the program, giving the closing talk on “Automatic FoRecasting”. Earlier in the day I am running a forecasting workshop where I will discuss forecasting issues and answer questions for about 90 minutes. There are still a few places left for the main event, and for the workshops. Book soon if you want to attend.
All the details are here.
I often see figures with two sets of prediction intervals plotted on the same graph using different line types to distinguish them. The results are almost always unreadable. A better way to do this is to use semi-transparent shaded regions. Here is an example showing two sets of forecasts for the Nile River flow.
f1 = forecast(auto.arima(Nile, lambda=0), h=20, level=95)
f2 = forecast(ets(Nile), h=20, level=95)
plot(f1, shadecol=rgb(0,0,1,.4), flwd=1,
main="Forecasts of Nile River flow",
xlab="Year", ylab="Billions of cubic metres")
The blue region shows 95% prediction intervals for the ARIMA forecasts, while the red region shows 95% prediction intervals for the ETS forecasts. Where they overlap, the colors blend to make purple. In this case, the point forecasts are quite close, but the prediction intervals are relatively different.
The GEFCom competitions have been a great success in generating good research on forecasting methods for electricity demand, and in enabling a comprehensive comparative evaluation of various methods. But they have only considered price forecasting in a simplified setting. So I’m happy to see this challenge is being taken up as part of the European Energy Market Conference for 2016, to be held from 6-9 June at the University of Porto in Portugal. Continue reading →
The github page for the forecast package currently shows the following information
Note the downloads figure: 264K/month. I know the package is popular, but that seems crazy. Also, the downloads figure on github only counts the downloads from the RStudio mirror, and ignores downloads from the other 125 mirrors around the world. Continue reading →
I’ve been having discussions with colleagues and university administration about the best way for universities to manage home-grown software.
The traditional business model for software is that we build software and sell it to everyone willing to pay. Very often, that leads to a software company spin-off that has little or nothing to do with the university that nurtured the development. Think MATLAB, S-Plus, Minitab, SAS and SPSS, all of which grew out of universities or research institutions. This model has repeatedly been shown to stifle research development, channel funds away from the institutions where the software was born, and add to research costs for everyone.
I argue that the open-source model is a much better approach both for research development and for university funding. Under the open-source model, we build software, and make it available for anyone to use and adapt under an appropriate licence. This approach has many benefits that are not always appreciated by university administrators. Continue reading →
This is a new competition being organized by EuroStat. The first phase involves nowcasting economic indicators at national and European level including unemployment, HICP, Tourism and Retail Trade and some of their variants.
The main goal of the competition is to discover promising methodologies and data sources that could, now or in the future, be used to improve the production of official statistics in the European Statistical System.
The organizers seem to have been encouraged by the success of Kaggle and other data science competition platforms. Unfortunately, they have chosen not to give any prizes other than an invitation to give a conference presentation or poster, which hardly seems likely to attract many good participants.
The deadline for registration is 10 January 2016. The duration of the competition is roughly a year (including about a month for evaluation).
See the call for participation for more information.
I prepared the following notes for a consulting client, and I thought they might be of interest to some other people too.
Let $y_t$ denote the value of the time series at time $t$, and suppose we wish to fit a trend with correlated errors of the form
$$ y_t = f(t) + n_t, $$
where $f(t)$ represents the possibly nonlinear trend and $n_t$ is an autocorrelated error process. Continue reading →
It is a while since I last updated the CRAN version of the forecast package, so I uploaded the latest version (6.2) today. The github version remains the most up-to-date version and is already two commits ahead of the CRAN version.
This update is mostly bug fixes and additional error traps. The full ChangeLog is listed below. Continue reading →
I gave a seminar at Stanford today. Slides are below. It was definitely the most intimidating audience I’ve faced, with Jerome Friedman, Trevor Hastie, Brad Efron, Persi Diaconis, Susan Holmes, David Donoho and John Chambers all present (and probably other famous names I’ve missed).
I’ll be giving essentially the same talk at UC Davis on Thursday. Continue reading →