Refereed papers

Rainbow plots, bagplots and boxplots for functional data

Rob J Hyndman and Han Lin Shang
Journal of Computational and Graphical Statistics (2010), 19(1), 29-45.

Abstract: We propose new tools for visualizing large numbers of functional data in the form of smooth curves or surfaces. The proposed tools include functional versions of the bagplot and boxplot, and make use of the first two robust principal component scores, Tukey’s data depth and highest density regions.

By-products of our graphical displays are outlier detection methods for functional data. We compare these new outlier detection methods with existing methods for detecting outliers in functional data and show that our methods are better able to identify the outliers.

Keywords: Highest density regions, Robust principal component analysis, Kernel density estimation, Outlier detection, Tukey’s halfspace depth.

Online paper

Working paper

R package

Using functional data analysis models to estimate future time trends of age-specific breast cancer mortality for the United States and England-Wales

Bircan Erbas1, Muhammad Akram2, Dorota M Gertig3, Dallas English4,5, John L. Hopper5, Anne M Kavanagh6 and Rob J Hyndman2
Journal of Epidemiology (2010), 20(2), 159-165.
  1. School of Public Health, La Trobe University, Bundoora, 3086 Australia
  2. Business and Economic Forecasting Unit, Monash University, Clayton, 3800, Australia.
  3. Victoria Cytology Service Inc, Carlton, 3053 Australia.
  4. Cancer Epidemiology Centre, The Cancer Council Victoria, Carlton 3053 Australia.
  5. Centre for MEGA Epidemiology, The University of Melbourne, Parkville 3053 Australia.
  6. Key Centre for Women’s Health in Society, School of Population Health, The University of Melbourne, Parkville, 3053 Australia.
ABSTRACT

Background: Mortality/incidence predictions are used for planning public health resources and need to accurately reflect age-related changes through time. We present a new forecasting model to estimate future trends in age-related breast cancer mortality for the United States and England-Wales.

Material and methods: We use functional data analysis techniques to model breast cancer mortality-age relationships in the United States from 1950 to 2001 and England-Wales from 1950 to 2003, and estimate 20-year predictions using a new forecasting method.

Results: In the United States, trends for women aged 45–54 years continued to decline since 1980. In contrast, trends in women aged 60 – 84 years increased in the 1980s and declined in the 1990s. For England-Wales, trends for women aged 45 to 74 years slightly increased prior to 1980, but declined thereafter. The greatest age-related changes for both countries were during the 1990s. For both the United States and England-Wales, trends are expected to decline and then stabilize with the greatest decline in women aged 60 – 70 years. Forecasts suggest relatively stable trends for women over 75 years.

Conclusions: Predicting age related changes in mortality/incidence can be used for planning and targeting programs for specific age groups. Currently, these models are being extended to incorporate other variables that may influence age-related changes in mortality/incidence trends. In their current form, these models will be most useful for modelling and projecting future trends of diseases where there has been very little advancement in treatment and minimal cohort effects such as lethal cancers.

Key words: breast cancer, forecasting, functional-data-analysis models, mortality trends

Online paper

Detecting trend and seasonal changes in satellite image time series

Jan Verbesselt1, Rob J Hyndman2, Glenn Newnham1, Darius Culvenor1
Remote Sensing of Environment (2010), 114(1), 106-115.
  1. Remote sensing team, CSIRO Sustainable Ecosystems, Private Bag 10, Melbourne VIC 3169, Australia
  2. Department of Econometrics and Business Statistics, Monash University, Melbourne VIC 3800, Australia
Abstract

A wealth of remotely sensed time series covering large areas is now available to the earth science community. Change detection methods are often not capable of detecting land cover changes within time series that are heavily influenced by seasonal climatic variations. Detecting change within the trend and seasonal components of time series enables the detection of different types of changes. Changes occurring in the trend component indicate disturbances (e.g., insect attack), while changes occurring in the seasonal component indicate phenological changes (e.g., change in land cover type). An approach is proposed for automated change detection in time series by detecting and characterizing Breaks For Additive Seasonal and Trend (BFAST). BFAST integrates the decomposition of time series into trend, seasonal, and remainder components with methods for detecting significant change within time series. BFAST iteratively estimates the time and number of changes, and characterizes change by its magnitude and direction. We tested BFAST by simulating 16-day composites of Normalized Difference Vegetation Index (NDVI) time series with varying amounts of seasonality and noise, and by adding abrupt changes at different times and magnitudes. This revealed that BFAST can robustly detect change with different magnitudes (>0.1 NDVI) within time series with different noise levels (0.01–0.07 σ) and seasonal amplitudes (0.1–0.5 NDVI) Additionally, BFAST was applied to 16-day NDVI MODIS (Moderate Resolution Imaging Spectroradiometer) composites for a forested study area in south eastern Australia. This showed that BFAST is able to detect and characterize spatial and temporal changes in a forested landscape. BFAST is developed as a generic change detection approach, and can be applied to time series without the need to normalize for specific land cover types, select a reference period, or define a threshold or change trajectory. The method can be used to detect and characterize changes within time series or can be integrated within monitoring frameworks and used as an alarm system to flag when and where significant changes occur.

Online paper

Density forecasting for long-term peak electricity demand

Rob J Hyndman and Shu Fan
IEEE Transactions on Power Systems, 2010, to appear.

Abstract: Long-term electricity demand forecasting plays an important role in planning for future generation facilities and transmission augmentation. In a long term context, planners must adopt a probabilistic view of potential peak demand levels, therefore density forecasts (providing estimates of the full probability distributions of the possible future values of the demand) are more helpful than point forecasts, and are necessary for utilities to evaluate and hedge the financial risk accrued by demand variability and forecasting uncertainty. This paper proposes a new methodology to forecast the density of long-term peak electricity demand.

Peak electricity demand in a given season is subject to a range of uncertainties, including underlying population growth, changing technology, economic conditions, prevailing weather conditions (and the timing of those conditions), as well as the general randomness inherent in individual usage. It is also subject to some known calendar effects due to the time of day, day of week, time of year, and public holidays.

We describe a comprehensive forecasting solution in this paper. First, we use semi-parametric additive models to estimate the relationships between demand and the driver variables, including temperatures, calendar effects and some demographic and economic variables. Then we forecast the demand distributions using a mixture of temperature simulation, assumed future economic scenarios, and residual bootstrapping. The temperature simulation is implemented through a new seasonal bootstrapping method with variable blocks.

The proposed methodology has been used to forecast the probability distribution of annual and weekly peak electricity demand for South Australia since 2007. We evaluate the performance of the methodology by comparing the forecast results with the actual demand of the summer 2007/08.

Keywords: Long-term demand forecasting, density forecast, time series, simulation.

Online article

The vector innovations structural time series framework: a simple approach to multivariate forecasting

Ashton de Silva1, Rob J Hyndman2 and Ralph D Snyder2
Statistical modelling (2010), to appear.
  1. School of Economics, Finance and Marketing, RMIT, VIC 3000, Australia.
  2. Department of Econometrics and Business Statistics, Monash University, VIC 3800, Australia.

Abstract The vector innovations structural time series framework is proposed as a way of modelling a set of related time series. Like all multivariate approaches, the aim is to exploit potential inter-series dependencies to improve the fit and forecasts. The model is based around an unobserved vector of components representing features such as the level and slope of each time series. Equations that describe the evolution of these components through time are used to represent the inter-temporal dependencies. The approach is illustrated on a bivariate data set comprising Australian exchange rates of the UK pound and US dollar. The forecasting accuracy of the new modelling framework is compared to other common uni- and multivariate approaches in an experiment using time series from a large macroeconomic database.

Keywords: vector innovations structural time series, state space model, multivariate time series, exponential smoothing, forecast comparison, vector autoregression.

Download pdf file

Exponential smoothing and non-negative data

Md. Akram1, Rob J. Hyndman1 and J. Keith Ord2
Australian and New Zealand Journal of Statistics (2009), 51(4), 415-432.
  1. Department of Econometrics and Business Statistics, Monash University, VIC 3800, Australia.
  2. McDonough School of Business, Georgetown University, Washington, DC20057, USA.

Abstract The most common forecasting methods in business are based on exponential smoothing and the most common time series in business are inherently non-negative. Therefore it is of interest to consider the properties of the potential stochastic models underlying exponential smoothing when applied to non-negative data. We explore exponential smoothing state space models for non-negative data under various assumptions about the innovations, or error, process.

We first demonstrate that prediction distributions from some commonly used state space models may have an infinite variance beyond a certain forecasting horizon. For multiplicative error models which do not have this flaw, we show that sample paths will converge almost surely to zero even when the error distribution is non-Gaussian. We propose a new model with similar properties to exponential smoothing, but which does not have these problems, and we develop some distributional properties for our new model.

We then explore the implications of our results for inference, and compare the short-term forecasting performance of the various models using data on the weekly sales of over three hundred items of costume jewelry.

The main findings of the research are that the Gaussian approximation is adequate for estimation and one-step-ahead forecasting. However, as the forecasting horizon increases, the approximate prediction intervals become increasingly problematic.  When the model is to be used for simulation purposes, a suitably specified scheme must be employed.

Keywords: forecasting; time series; exponential smoothing; positive-valued processes; seasonality; state space models.

Online paper

Forecasting functional time series

Rob J Hyndman and Han Lin Shang
Journal of the Korean Statistical Society (2009), 38(3), 199-221. (With discussion)

Abstract: We propose forecasting functional time series using weighted functional principal component regression and weighted functional partial least squares regression. These approaches allow for smooth functions, assign higher weights to more recent data, and provide a modeling scheme that is easily adapted to allow for constraints and other information. We illustrate our approaches using age-specific French female mortality rates from 1816 to 2006 and age-specific Australian fertility rates from 1921 to 2006, and show that these weighted methods improve forecast accuracy in comparison to their unweighted counterparts. We also propose two new bootstrap methods to construct prediction intervals, and evaluate and compare their empirical coverage probabilities.

Keywords Demographic forecasting; Functional data; Functional partial least squares; Functional principal components; Functional time series.

MSC primary: 62G08; 62H25; secondary: 62G09; 62J07; 62P05.

Online article

Monitoring processes with changing variances

J. Keith Ord, Anne B. Koehler, Ralph D. Snyder and Rob J. Hyndman,
International Journal of Forecasting (2009), 25(3), 518-525.

Abstract: Statistical process control (SPC) has evolved beyond its classical applications in manufacturing to monitoring economic and social phenomena. This extension requires consideration of autocorrelated and possibly non-stationary time series. Less attention has been paid to the possibility that the variance of the process may also change over time. In this paper we use the innovations state space modeling framework to develop conditionally heteroscedastic models. We provide examples to show that the incorrect use of homoscedastic models may lead to erroneous decisions about the nature of the process. The framework is extended to include counts data, when we also introduce a new type of chart, the P-value chart, to accommodate the changes in distributional form from one period to the next.

Keywords: control charts, count data, GARCH, heteroscedasticity, innovations, state space, statistical process control

Online paper

Rule induction for forecasting method selection: meta-learning the characteristics of univariate time series

Xiaozhe Wang1, Kate A. Smith-Miles1 and Rob J. Hyndman2
Neurocomputing, 72 (2009), 2581–2594.
  1. Faculty of Information Technology, Monash University, Clayton VIC 3800, Australia.
  2. Department of Econometrics and Business Statistics, Monash University, VIC 3800, Australia.

Abstract This paper proposes a new method of interval estimation for the long run response (or elasticity) parameter from a general linear dynamic model. We employ the bias-corrected bootstrap, in which small sample biases associated with the parameter estimators are adjusted in two stages of the bootstrap. As a means of bias-correction, we use alternative analytic and bootstrap methods. To take atypical properties of the long run elasticity estimator into account, the highest density region (HDR) method is adopted for the construction of confidence intervals. From an extensive Monte Carlo experiment, we found that the HDR confidence interval based on indirect analytic bias-correction performs better than other alternatives, providing tighter intervals with excellent coverage properties. Two case studies (demand for oil and demand for beef) illustrate the results of the Monte Carlo experiment with respect to the superior performance of the confidence interval based on indirect analytic bias-correction.

Keywords: ARDL model, bias-correction, bootstrapping, Highest density region, long run elasticity.

Online article

Hierarchical forecasts for Australian domestic tourism

George Athanasopoulos1 , Roman A. Ahmed1 and Rob J. Hyndman1
International Journal of Forecasting (2009), 25(1), 146-166.
  1. Department of Econometrics and Business Statistics, Monash University, VIC 3800, Australia.

Abstract In this paper we explore the hierarchical nature of tourism demand time series and produce short-term forecasts for Australian domestic tourism. The data and forecasts are organized in a hierarchy based on disaggregating the data for different geographical regions and for different purposes of travel. We consider five approaches to hierarchical forecasting: two variations of the top-down approach, the bottom-up method, a newly proposed top-down approach where top-level forecasts are disaggregated according to forecasted proportions of lower level series, and a recently proposed optimal combination approach. Our forecast performance evaluation shows that the top-down approach based on forecast proportions and the optimal combination method perform best for the tourism hierarchies we consider. By applying these methods, we produce detailed forecasts for the Australian domestic tourism market.

Keywords: Australia, exponential smoothing, hierarchical forecasting, innovations state space models, optimal combination forecasts, top-down method, tourism demand.

Online article