The data used in the tourism forecasting competition, discussed in Athanasopoulos et al (2011), have been made available in the Tcomp package for R. The objects are of the same format as for Mcomp package containing data from the M1 and M3 competitions.
I’m currently in the Netherlands for a few weeks, and I’ll be giving a seminar at the Data Science Centre in Eindhoven next Wednesday afternoon on “Visualization of big time series data”. Details follow. Continue reading →
A common problem is to forecast the aggregate of several time periods of data, using a model fitted to the disaggregated data. For example, you may have monthly data but wish to forecast the total for the next year. Or you may have weekly data, and want to forecast the total for the next four weeks.
If the point forecasts are means, then adding them up will give a good estimate of the total. But prediction intervals are more tricky due to the correlations between forecast errors.
I’ve pushed a minor update to the forecast package to CRAN. Some highlights are listed here.
I have a new R package available to do temporal hierarchical forecasting, based on my paper with George Athanasopoulos, Nikolaos Kourentzes and Fotios Petropoulos. (Guess the odd guy out there!)
It is called “thief” – an acronym for Temporal HIErarchical Forecasting. The idea is to take a seasonal time series, and compute all possible temporal aggregations that result in an integer number of observations per year. For example, a quarterly time series is aggregated to biannual and annual; while a monthly time series is aggregated to 2-monthly, quarterly, 4-monthly, biannual and annual. Each of the resulting time series are forecast, and then the forecasts are reconciled using the hierarchical reconciliation algorithm described in our paper.
It turns out that this tends to give better forecasts, even though no new information has been added, especially for time series with long seasonal periods. It also allows different types of forecasts for different forecast horizons to be combined in a consistent manner.
A few years ago, I wrote a paper with George Athanasopoulos and others about a tourism forecasting competition. We originally made the data available as an online supplement to the paper, but that has unfortunately since disappeared although the paper itself is still available.
So I am posting the data here in case anyone wants to use it for replicating our results, or for other research purposes. The data are split into monthly, quarterly and yearly data. There are 366 monthly series, 427 quarterly series and 518 yearly series. Each group of series is further split into training data and test data. Further information is provided in the paper.
If you use the data in a publication, please cite the IJF paper as the source, along with a link to this blog post.
Version 7 of the forecast package was released on CRAN about a month ago, but I'm only just getting around to posting about the new features.
The most visible feature was the introduction of ggplot2 graphics. I first wrote the forecast package before ggplot2 existed, and so only base graphics were available. But I figured it was time to modernize and use the nice features available from ggplot2. The following examples illustrate the main new graphical functionality.
For illustration purposes, I'm using the male and female monthly deaths from lung diseases in the UK.