Demographic forecasting using functional data analysis

Seminar given at the University of Wollongong, 8 September 2010.
Also given to the Statistical Society of Australia, Victorian Branch, 28 September 2010.

Abstract: Functional time series are curves that are observed sequentially in time. In demography, such data arise as the curves formed by annual death rates as a function of age or annual fertility rates as a function of age. I will discuss methods for describing, modelling and forecasting such functional time series data. Challenges include:

  • developing useful graphical tools (I will illustrate a functional version of the boxplot);
  • dealing with outliers (e.g., death rates have outliers in years of wars or epidemics);
  • cohort effects (how can we identify and allow for these in the forecasts);
  • synergy between groups (e.g, we expect male and female mortality rates to evolve in a similar way in the future);
  • deriving prediction intervals for forecasts;
  • how to combine the mortality and fertility forecasts to obtain forecasts of the total population.

I will illustrate the ideas using data from Australia and France.

Slides (12Mb).

Phenological change detection while accounting for abrupt and gradual trends in satellite image time series

Jan Verbesselt1, Rob J Hyndman2, Achim Zeilis3, Darius Culvenor1
  1. Remote sensing team, CSIRO Sustainable Ecosystems, Private Bag 10, Melbourne VIC 3169, Australia
  2. Department of Econometrics and Business Statistics, Monash University, Melbourne VIC 3800, Australia
  3. Institute for Statistics, Leopold-​​Franzens-​​Universitt Innsbruck, 6020 Innsbruck, Austria
Remote Sensing of Environment, to appear.

Abstract
A challenge in phenology studies is understanding what constitutes significant phenological change amidst background variation (e.g. noise) and ecosystem disturbances (e.g. fires). The majority of phenological studies have focussed on extracting critical points in the seasonal growth cycle (e.g. Start-​​of-​​spring), without exploiting the full temporal detail. Moreover, the high degree of phenological variability between years demonstrates the necessity of distinguishing long term phenological change from temporal variability. Here, we evaluate the phenological change detection ability of a method for detecting Breaks For Additive Seasonal and Trend (BFAST). BFAST integrates the decomposition of time series into trend, seasonal, and noise components with methods for detecting change within time series. BFAST detects significant phenological changes within time series by exploiting the full time series without needing to derive phenological metrics. The times and numbers of trend and phenological changes are iteratively estimated by fitting piecewise robust linear models, of which the parameters are used to characterize change by its magnitude and direction. We tested BFAST by simulating 16-​​day Normalized Difference Vegetation Index (NDVI) time series with varying amounts of seasonality and noise, containing abrupt disturbances (e.g. fires) and long term phenological changes. This revealed that BFAST is able to accurately detect the number and timing of phenological changes within time series while accounting for disturbances (e.g. fires) and noise. The simulation study also showed that the phenological change detection is influenced by the signal to noise ratio of the time series. Application of the method on 16-​​day NDVI MODIS images from 2000 until 2009 for a forested study area in south eastern Australia confirmed these results. Phenological change is more easily detected in grasslands where the seasonal amplitude is larger than 0.3 NDVI when compared to evergreen forests where the seasonal amplitude is approximately 0.1 NDVI while noise levels were the same. BFAST present a novel approach for the detection of significant long term phenological changes within full time series which is necessary to study spatio-​​temporal patterns in land cover phenology, distinguish change from interannual variability in a global change context. The method can be applied to other disciplines dealing with seasonal time series data, such as biology, hydrology, and climatology to detect and characterize change within time series. The methods described in this study are available in the BFAST package for R (R Development Core Team, 2009).

Keywords: seasonal change, phenology, change detection, time series, disturbance,climate change, remote sensing, NDVIMODIS.

Working paper

Online paper

demography: Forecasting mortality, fertility, migration and population data

The demography package for R contains functions for various demographic analyses. It provides facilities for demographic statistics, modelling and forecasting. In particular, it implements lifetable calculations; Lee-​​Carter modelling and variants; functional data analysis of mortality rates, fertility rates, net migration numbers; and stochastic population forecasting.

The package has been in development for years, but I’ve finally now posted it to CRAN.

Examples

Note: Some of these may no longer work with the updated package. I’ll fix them soon.

Extra data are available in the addb package.

Read the rest of this entry »

Short-​​term load forecasting based on a semi-​​parametric additive model

Shu Fan and Rob J Hyndman

Abstract
Short-​​term load forecasting is an essential instrument in power system planning, operation and control. Many operating decisions are based on load forecasts, such as dispatch scheduling of generating capacity, reliability analysis, and maintenance planning for the generators. Overestimation of electricity demand will cause a conservative operation, which leads to the start-​​up of too many units or excessive energy purchase, thereby supplying an unnecessary level of reserve. On the contrary, underestimation may result in a risky operation, with insufficient preparation of spinning reserve, causing the system to operate in a vulnerable region to the disturbance.

In this paper, semi-​​parametric additive models are proposed to estimate the relationships between demand and the driver variables. Specifically, the inputs for these models are calendar variables, lagged actual demand observations and historical and forecast temperature traces for one or more sites in the target power system. In addition to point forecasts, prediction intervals are also estimated using a modified bootstrap method suitable for the complex seasonality seen in electricity demand data. The proposed methodology has been used to forecast the half-​​hourly electricity demand for up to seven days ahead for power systems in the Australian National Electricity Market. The performance of the methodology is validated via out-​​of-​​sample experiments with real data from the power system, as well as through on-​​site implementation by the system operator.

Download paper

The price elasticity of electricity demand in South Australia

Shu Fan and Rob J Hyndman

Business and Economic Forecasting Unit, Monash University, Clayton, Victoria 3800, Australia

Abstract
In this paper, the price elasticity of electricity demand, representing the sensitivity of customer demand to the price of electricity, has been estimated for South Australia. We first undertake a review of the scholarly literature regarding electricity price elasticity for different regions and systems. Then we perform an empirical evaluation of the historic South Australian price elasticity, focussing on the relationship between price and demand quantiles at each half-​​hour of the day.

This work attempts to determine whether there is any variation in price sensitivity with the time of day or quantile, and to estimate the form of any relationships that might exist in South Australia.

Keywords: Electricity demand; Price elasticity.

Download paper

Exploratory graphics for functional data

Han Lin Shang and Rob J Hyndman

Department of Econometrics and Business Statistics, Monash University, Clayton, Australia

Interface 2010: Computing Science and Statistics, Seattle, Washington, June 16–19, 2010

Abstract
We survey some graphical tools for visualizing large sets of functional data represented by smooth curves. These graphical tools include the phase-​​plane plot, singular value decomposition plot, rainbow plot, functional variants of the bagplot and the highest density region boxplot. The latter two techniques utilize the first two robust principal component scores, Tukey’s halfspace location depth and highest density regions.

The computer code and datasets are collected in the rainbow package for R, which is available at the Comprehensive R Archive Network (CRAN).

Keywords: Highest density regions, Kernel density estimation, Robust principal component analysis,
Singular value decomposition, Tukey’s halfspace location depth.

Download article (10Mb)

A form of religion

An exhortation/​sermon given at Dandenong Bible Education Centre on 25 July 2010.

Audio

Short-​​term load forecasting based on a semi-​​parametric additive model

Shu Fan and Rob J Hyndman
20th Australasian Universities Power Engineering Conference

5–8 December 2010, University of Canterbury, Christchurch, New Zealand

Abstract
Short-​​term load forecasting is an essential instrument in power system planning, operation and control. Many operating decisions are based on load forecasts, such as dispatch scheduling of generating capacity, reliability analysis, and maintenance planning for the generators. Overestimation of electricity demand will cause a conservative operation, which leads to the start-​​up of too many units or excessive energy purchase, thereby supplying an unnecessary level of reserve. On the contrary, underestimation may result in a risky operation, with insufficient preparation of spinning reserve, causing the system to operate in a vulnerable region to the disturbance.

In this paper, semi-​​parametric additive models are proposed to estimate the relationships between demand and the driver variables. Specifically, the inputs for these models are calendar variables, lagged actual demand observations and historical and forecast temperature traces for one or more sites in the target power system. The proposed methodology has been used to forecast the half-​​hourly electricity demand for up to seven days ahead for power systems in the Australian National Electricity Market. The performance of the methodology is validated via out-​​of-​​sample experiment with the real data from the power system, as well as the on-​​site operation by the system operator.

Investigating the influence of synoptic-​​scale circulation on air quality using self-​​organizing maps and generalized additive modelling

John L Pearcea, Jason Beringera, Neville Nichollsa, Rob J Hyndmanb, Petteri Uotilaa, and Nigel J Tappera

a School of Geography and Environmental Science, Monash University, Melbourne, Australia
b Department of Econometrics and Business Statistics, Monash University, Melbourne, Australia

Abstract
The influence of synoptic-​​scale circulations on air quality is an area of increasing interest to air quality management in regards to future climate change. This study presents an analysis where the dominant synoptic ‘types’ over the region of Melbourne, Australia are determined and linked to regional air quality. First, a self-​​organising map (SOM) is used to generate a time series of synoptic charts that classify the annual daily circulation affecting Melbourne into 20 different synoptic types. SOM results are then employed within the framework of a generalized additive model (GAM) to identify links between synoptic-​​scale circulations and observed changes air pollutant concentrations. The GAMs estimate shifts in pollutant concentrations under each synoptic type after controlling for long-​​term trends, seasonality, weekly emissions, spatial variation, and temporal persistence. Results showed the aggregate impact of synoptic circulations in the models to be quite modest as only 5.1% of the daily variance in O3, 4.7% in PM10, and 7.1% in NO2 were explained by shifts in synoptic circulations. Further analysis of the partial residual plots identified that despite a modest response at the aggregate level, individual synoptic categories had differential effects on air pollutants. In particular, increases of up to 40% in NO2 and PM10 and 30% in O3 occur when a synoptic conditions result in a north-​​easterly gradient wind over the Melbourne area. Additionally, NO2 and PM10 levels also showed increases of up to 40% when a strong high pressure system was centered directly over the Melbourne area. In sum, the unified approach of SOM and GAM proved to be a complementary suite of tools capable of identifying the entire range synoptic circulation patterns over a particular region and quantifying how they influence local air quality.

Keywords: air pollution, generalized additive models, self-​​organizing maps, and synoptic meteorology.

Working paper

Quantifying the influence of local meteorology on air quality using generalized additive modelling

John L Pearcea, Jason Beringera, Neville Nichollsa, Rob J Hyndmanb and Nigel J Tappera

a School of Geography and Environmental Science, Monash University, Melbourne, Australia
b Department of Econometrics and Business Statistics, Monash University, Melbourne, Australia

Abstract
Quantifying the observed relationships between local meteorology and air pollution provides air quality managers with a knowledge base that may prove useful in understanding how climate change may potentially impact air quality. This paper presents the estimated response of ozone (O3), particulate matter ≤ 10 μm (PM10), and nitrogen dioxide (NO2) to individual local meteorological variables in Melbourne, Australia over the period of 1999 to 2006. The relationships have been quantified after controlling for long-​​term trends, seasonality, weekly emissions, spatial variation, and temporal persistence using the framework of a generalized additive modelling (GAM). The nature of the response of each pollutant to individual meteorological variables is presented using partial residual plots described on a percentage scale as marginal effects. The aggregate impact of local meteorology in the models was found to explain 26.3% of the variance in O3, 21.1% in PM10, and 26.7% in NO2. High temperatures resulted in strongest positive response for all pollutants with a 150% increase above the mean for O3 and PM10 and a 120% for NO2. Other variables, such as boundary layer height, winds, water vapour pressure, radiation, precipitation and mean sea-​​level pressure, display some importance for one or more of the pollutants, but their impact in the models was less pronounced. Overall, this analysis presents a solid foundation for understanding the importance of local meteorology as a driver of regional air pollution in Melbourne in a framework that can be applied in other regions. Additionally, these results can be used to corroborate findings from studies using numerical air quality models.

Keywords: air pollution, climate change, generalized additive models, and meteorology.

Working paper