The Human Mortality Database is a wonderful resource for anyone interested in demographic data. It is a carefully curated collection of high quality deaths and population data from 37 countries, all in a consistent format with consistent definitions. I have used it many times and never cease to be amazed at the care taken to maintain such a great resource.
The data are continually being revised and updated. Today the Australian data has been updated to 2011. There is a time lag because of lagged death registrations which results in undercounts; so only data that are likely to be complete are included.
Tim Riffe from the HMD has provided the following information about the update:
- All death counts since 1964 are now included by year of occurrence, up to 2011. We have 2012 data but do not publish them because they are likely a 5% undercount due to lagged registration.
- Death count inputs for 1921 to 1963 are now in single ages. Previously they were in 5-year age groups. Rather than having an open age group of 85+ in this period counts usually go up to the maximum observed (stated) age. This change (i) introduces minor heaping in early years and (ii) implies different apparent old-age mortality than before, since previously anything above 85 was modeled according to the Methods Protocol.
- Population denominators have been swapped out for years 1992 to the present, owing to new ABS methodology and intercensal estimates for the recent period.
Some of the data can be read into R using the
hmd.e0 functions from the demography package. Tim has his own package on github that provides a more extensive interface.
This week my research group discussed Adrian Raftery’s recent paper on “Use and Communication of Probabilistic Forecasts” which provides a fascinating but brief survey of some of his work on modelling and communicating uncertain futures. Coincidentally, today I was also sent a copy of David Spiegelhalter’s paper on “Visualizing Uncertainty About the Future”. Both are well-worth reading.
It made me think about my own efforts to communicate future uncertainty through graphics. Of course, for time series forecasts I normally show prediction intervals. I prefer to use more than one interval at a time because it helps convey a little more information. The default in the forecast package for R is to show both an 80% and a 95% interval like this: Continue reading →
Review papers are extremely useful for new researchers such as PhD students, or when you want to learn about a new research field. The International Journal of Forecasting produced a whole review issue in 2006, and it contains some of the most highly cited papers we have ever published. Now, beginning with the latest issue of the journal, we have started publishing occasional review articles on selected areas of forecasting. The first two articles are:
- Electricity price forecasting: A review of the state-of-the-art with a look into the future by Rafał Weron.
- The challenges of pre-launch forecasting of adoption time series for new durable products by Paul Goodwin, Sheik Meeran, and Karima Dyussekeneva.
Both tackle very important topics in forecasting. Weron’s paper contains a comprehensive survey of work on electricity price forecasting, coherently bringing together a large body of diverse research — I think it is the longest paper I have ever approved at 50 pages. Goodwin, Meeran and Dyussekeneva review research on new product forecasting, a problem every company that produces goods or services has faced; when there are no historical data available, how do you forecast the sales of your product?
We have a few other review papers in progress, so keep an eye out for them in future issues.
I get questions about this almost every week. Here is an example from a recent comment on this blog:
I have two large time series data. One is separated by seconds intervals and the other by minutes. The length of each time series is 180 days. I’m using R (3.1.1) for forecasting the data. I’d like to know the value of the “frequency” argument in the ts() function in R, for each data set. Since most of the examples and cases I’ve seen so far are for months or days at the most, it is quite confusing for me when dealing with equally separated seconds or minutes. According to my understanding, the “frequency” argument is the number of observations per season. So what is the “season” in the case of seconds/minutes? My guess is that since there are 86,400 seconds and 1440 minutes a day, these should be the values for the “freq” argument. Is that correct?
Continue reading →
Since my last post on the seasonal adjustment problems at the Australian Bureau of Statistics, I’ve been working closely with people within the ABS to help them resolve the problems in time for tomorrow’s release of the October unemployment figures.
Now that the ABS has put out a statement about the problem, I thought it would be useful to explain the underlying methodology for those who are interested. Continue reading →
I do not normally post job adverts, but this was very specifically targeted to “applied time series candidates” so I thought it might be of sufficient interest to readers of this blog. Continue reading →
Almost all prediction intervals from time series models are too narrow. This is a well-known phenomenon and arises because they do not account for all sources of uncertainty. In my 2002 IJF paper, we measured the size of the problem by computing the actual coverage percentage of the prediction intervals on hold-out samples. We found that for ETS models, nominal 95% intervals may only provide coverage between 71% and 87%. The difference is due to missing sources of uncertainty.
There are at least four sources of uncertainty in forecasting using time series models:
- The random error term;
- The parameter estimates;
- The choice of model for the historical data;
- The continuation of the historical data generating process into the future.
Continue reading →
The hts package for R allows for forecasting hierarchical and grouped time series data. The idea is to generate forecasts for all series at all levels of aggregation without imposing the aggregation constraints, and then to reconcile the forecasts so they satisfy the aggregation constraints. (An introduction to reconciling hierarchical and grouped time series is available in this Foresight paper.)
The base forecasts can be generated using any method, with ETS models and ARIMA models provided as options in the
forecast.gts() function. As ETS models do not allow for regressors, you will need to choose ARIMA models if you want to include regressors. Continue reading →
Souhaib Ben Taieb has been awarded his doctorate at the Université libre de Bruxelles and so he is now officially Dr Ben Taieb! Although Souhaib lives in Brussels, and was a student at the Université libre de Bruxelles, I co-supervised his doctorate (along with Professor Gianluca Bontempi). Souhaib is the 19th PhD student of mine to graduate.
His thesis was on “Machine learning strategies for multi-step-ahead time series forecasting” and is now available online. The prior research in this area has largely centred around two strategies (recursive and direct), and which one works better in certain circumstances. Recursive forecasting is the standard approach where a model is designed to predict one step ahead, and is then iterated to obtain multi-step-ahead forecasts. Direct forecasting involves using a separate forecasting model for each forecast horizon. Souhaib took a very different perspective from the prior research and has developed new strategies that are either hybrids of these two strategies, or completely different from either of them. The resulting forecasts are often significantly better than those obtained using the more traditional approaches.
Some of the papers to come out of Souhaib’s thesis are already available on his Google scholar page.
Well done Souhaib, and best wishes for the future.
Although the Guardian claimed yesterday that I had explained “what went wrong” in the July and August unemployment figures, I made no attempt to do so as I had no information about the problems. Instead, I just explained a little about the purpose of seasonal adjustment.
However, today I learned a little more about the ABS unemployment data problems, including what may be the explanation for the fluctuations. This explanation was offered by Westpac’s chief economist, Bill Evans (see here for a video of him explaining the issue). Continue reading →