Since my last post on the seasonal adjustment problems at the Australian Bureau of Statistics, I’ve been working closely with people within the ABS to help them resolve the problems in time for tomorrow’s release of the October unemployment figures.
I do not normally post job adverts, but this was very specifically targeted to “applied time series candidates” so I thought it might be of sufficient interest to readers of this blog. Continue reading →
Almost all prediction intervals from time series models are too narrow. This is a well-known phenomenon and arises because they do not account for all sources of uncertainty. In my 2002 IJF paper, we measured the size of the problem by computing the actual coverage percentage of the prediction intervals on hold-out samples. We found that for ETS models, nominal 95% intervals may only provide coverage between 71% and 87%. The difference is due to missing sources of uncertainty.
There are at least four sources of uncertainty in forecasting using time series models:
- The random error term;
- The parameter estimates;
- The choice of model for the historical data;
- The continuation of the historical data generating process into the future.
The hts package for R allows for forecasting hierarchical and grouped time series data. The idea is to generate forecasts for all series at all levels of aggregation without imposing the aggregation constraints, and then to reconcile the forecasts so they satisfy the aggregation constraints. (An introduction to reconciling hierarchical and grouped time series is available in this Foresight paper.)
The base forecasts can be generated using any method, with ETS models and ARIMA models provided as options in the
forecast.gts() function. As ETS models do not allow for regressors, you will need to choose ARIMA models if you want to include regressors. Continue reading →
Souhaib Ben Taieb has been awarded his doctorate at the Université libre de Bruxelles and so he is now officially Dr Ben Taieb! Although Souhaib lives in Brussels, and was a student at the Université libre de Bruxelles, I co-supervised his doctorate (along with Professor Gianluca Bontempi). Souhaib is the 19th PhD student of mine to graduate.
His thesis was on “Machine learning strategies for multi-step-ahead time series forecasting” and is now available online. The prior research in this area has largely centred around two strategies (recursive and direct), and which one works better in certain circumstances. Recursive forecasting is the standard approach where a model is designed to predict one step ahead, and is then iterated to obtain multi-step-ahead forecasts. Direct forecasting involves using a separate forecasting model for each forecast horizon. Souhaib took a very different perspective from the prior research and has developed new strategies that are either hybrids of these two strategies, or completely different from either of them. The resulting forecasts are often significantly better than those obtained using the more traditional approaches.
Some of the papers to come out of Souhaib’s thesis are already available on his Google scholar page.
Well done Souhaib, and best wishes for the future.
Although the Guardian claimed yesterday that I had explained “what went wrong” in the July and August unemployment figures, I made no attempt to do so as I had no information about the problems. Instead, I just explained a little about the purpose of seasonal adjustment.
However, today I learned a little more about the ABS unemployment data problems, including what may be the explanation for the fluctuations. This explanation was offered by Westpac’s chief economist, Bill Evans (see here for a video of him explaining the issue). Continue reading →
It’s not every day that seasonal adjustment makes the front page of the newspapers, but it has today with the ABS saying that the recent seasonally adjusted unemployment data would be revised.
I was interviewed about the underlying concepts for the Guardian in this piece.
Further comment from me about users paying for the ABS data is here.
I keep telling students that there are lots of jobs in data science (including statistics), and they often tell me they can’t find them advertised. As usual, you do have to do some networking, and one of the best ways of doing it is via a Data Science Meetup. Many cities now have them including Melbourne, Sydney, London, etc. It is the perfect opportunity to meet with local employers, many of which are hiring due to the huge expansion in the use of data analysis in business (aka business analytics).
At the end of each Melbourne meetup, some employers have been advertising their current analytic job openings to the audience.
Now the local organizers are going to extend the opportunity to allow job-searchers to give a 90 second pitch to employers. Details are provided on the message board.
The International Institute of Forecasters sponsors workshops every year, each of which focuses on a specific theme. The purpose of these workshops is to facilitate small, informal meetings where experts in a particular field of forecasting can discuss forecasting problems, research, and solutions. Over the years, our workshops have covered topics from Predicting Rare Events, ICT Forecasting, and, most recently, Singular Spectrum Analysis. Often these workshops are associated with a special issue of the International Journal of Forecasting.
If you are already hosting a workshop on a forecasting topic and need support from the IIF, or if you are interested in organising and hosting a new workshop, please contact George Athanasopoulos.
A list of past workshops and workshop guidelines are provided on the IIF website.
I’ve received a few emails about including regression variables (i.e., covariates) in TBATS models. As TBATS models are related to ETS models,
tbats() is unlikely to ever include covariates as explained here. It won’t actually complain if you include an
xreg argument, but it will ignore it.
When I want to include covariates in a time series model, I tend to use
auto.arima() with covariates included via the
xreg argument. If the time series has multiple seasonal periods, I use Fourier terms as additional covariates. See my post on forecasting daily data for some discussion of this model. Note that
fourierf() now handle
msts objects, so it is very simple to do this.
For example, if
holiday contains some dummy variables associated with public holidays and
holidayf contains the corresponding variables for the first 100 forecast periods, then the following code can be used:
y <- msts(x, seasonal.periods=c(7,365.25)) z <- fourier(y, K=c(5,5)) zf <- fourierf(y, K=c(5,5), h=100) fit <- auto.arima(y, xreg=cbind(z,holiday), seasonal=FALSE) fc <- forecast(fit, xreg=cbind(zf,holidayf), h=100)
The main disadvantage of the ARIMA approach is that the seasonality is forced to be periodic, whereas a TBATS model allows for dynamic seasonality.