I’ve now resurrected the collection of research journals that I follow, and set it up as a shared collection in feedly. So anyone can easily subscribe to all of the same journals, or select a subset of them, to follow on feedly. Continue reading →

# Tag / statistics

# Seminars in Taiwan

I’m currently visiting Taiwan and I’m giving two seminars while I’m here — one at the National Tsing Hua University in Hsinchu, and the other at Academia Sinica in Taipei. Details are below for those who might be nearby. Continue reading →

# Statistical modelling and analysis of big data

There is a one day workshop on this topic on 23 February 2015 at QUT in Brisbane. I will be speaking on “Visualizing and forecasting big time series data”.

### OVERVIEW

Big data is now endemic in business, industry, government, environmental management, medical science, social research and so on. One of the commensurate challenges is how to effectively model and analyse these data.

This workshop will bring together national and international experts in statistical modelling and analysis of big data, to share their experiences, approaches and opinions about future directions in this field.

The workshop programme will commence at 8.30am and close at 5pm. Registration is free, however numbers are strictly limited so please ensure you register when you receive your invitation via email. Morning and afternoon tea will be provided; participants will need to purchase their own lunch.

Further details will be made available in early January. Continue reading →

# Am I a data scientist?

Last night I gave a very short talk (less than 5 minutes) at the Melbourne Analytics Charity Christmas Gala, a combined event of the Statistical Society of Australia, Data Science Melbourne, Big Data Analytics and Melbourne Users of R Network.

This is (roughly) what I said. Continue reading →

# Prediction competitions

Competitions have a long history in forecasting and prediction, and have been instrumental in forcing research attention on methods that work well in practice. In the forecasting community, the M competition and M3 competition have been particularly influential. The data mining community have the annual KDD cup which has generated attention on a wide range of prediction problems and associated methods. Recent KDD cups are hosted on kaggle.

In my research group meeting today, we discussed our (limited) experiences in competing in some Kaggle competitions, and we reviewed the following two papers which describe two prediction competitions:

# Visualization of probabilistic forecasts

This week my research group discussed Adrian Raftery’s recent paper on “Use and Communication of Probabilistic Forecasts” which provides a fascinating but brief survey of some of his work on modelling and communicating uncertain futures. Coincidentally, today I was also sent a copy of David Spiegelhalter’s paper on “Visualizing Uncertainty About the Future”. Both are well-worth reading.

It made me think about my own efforts to communicate future uncertainty through graphics. Of course, for time series forecasts I normally show prediction intervals. I prefer to use more than one interval at a time because it helps convey a little more information. The default in the forecast package for R is to show both an 80% and a 95% interval like this: Continue reading →

# Seasonal periods

I get questions about this almost every week. Here is an example from a recent comment on this blog:

I have two large time series data. One is separated by seconds intervals and the other by minutes. The length of each time series is 180 days. I’m using R (3.1.1) for forecasting the data. I’d like to know the value of the “frequency” argument in the ts() function in R, for each data set. Since most of the examples and cases I’ve seen so far are for months or days at the most, it is quite confusing for me when dealing with equally separated seconds or minutes. According to my understanding, the “frequency” argument is the number of observations per season. So what is the “season” in the case of seconds/minutes? My guess is that since there are 86,400 seconds and 1440 minutes a day, these should be the values for the “freq” argument. Is that correct?

# ABS seasonal adjustment update

Since my last post on the seasonal adjustment problems at the Australian Bureau of Statistics, I’ve been working closely with people within the ABS to help them resolve the problems in time for tomorrow’s release of the October unemployment figures.

Now that the ABS has put out a statement about the problem, I thought it would be useful to explain the underlying methodology for those who are interested. Continue reading →

# Jobs at Amazon

I do not normally post job adverts, but this was very specifically targeted to “applied time series candidates” so I thought it might be of sufficient interest to readers of this blog. Continue reading →

# Prediction intervals too narrow

Almost all prediction intervals from time series models are too narrow. This is a well-known phenomenon and arises because they do not account for all sources of uncertainty. In my 2002 IJF paper, we measured the size of the problem by computing the actual coverage percentage of the prediction intervals on hold-out samples. We found that for ETS models, nominal 95% intervals may only provide coverage between 71% and 87%. The difference is due to missing sources of uncertainty.

There are at least four sources of uncertainty in forecasting using time series models:

- The random error term;
- The parameter estimates;
- The choice of model for the historical data;
- The continuation of the historical data generating process into the future.