This is an example of how to use the demography package in R for stochastic population forecasting with coherent components. It is based on the papers by Hyndman and Booth (IJF 2008) and Hyndman, Booth and Yasmeen (Demography 2013). I will use Australian data from 1950 to 2009 and forecast the next 50 years. In demography, “coherent” forecasts are where male and females (or other sub-groups) do not diverge over time. (Essentially, we require the difference between the groups to be stationary.) When we wrote the 2008 paper, we did not know how to constrain the forecasts to be coherent in a functional data context and so this was not discussed. My later 2013 paper provided a way of imposing coherence. This blog post shows how to implement both ideas using R.

## Posts Tagged ‘R’:

## Plotting the characteristic roots for ARIMA models

When modelling data with ARIMA models, it is sometimes useful to plot the inverse characteristic roots. The following functions will compute and plot the inverse roots for any fitted ARIMA model (including seasonal models).

## Variations on rolling forecasts

Rolling forecasts are commonly used to compare time series models. Here are a few of the ways they can be computed using R. I will use ARIMA models as a vehicle of illustration, but the code can easily be adapted to other univariate time series models.

## Varian on big data

Last week my research group discussed Hal Varian’s interesting new paper on “Big data: new tricks for econometrics”, Journal of Economic Perspectives, 28(2): 3–28. It’s a nice introduction to trees, bagging and forests, plus a very brief entrée to the LASSO and the elastic net, and to slab and spike regression. Not enough to be able to use them, but ok if you’ve no idea what they are.

## Specifying complicated groups of time series in hts

With the latest version of the hts package for R, it is now possible to specify rather complicated grouping structures relatively easily. All aggregation structures can be represented as hierarchies or as cross-products of hierarchies. For example, a hierarchical time series may be based on geography: country, state, region, store. Often there is also a separate product hierarchy: product groups, product types, packet size. Forecasts of all the different types of aggregation are required; e.g., product type A within region X. The aggregation structure is a cross-product of the two hierarchies. This framework includes even apparently non-hierarchical data: consider the simple case of a time series of deaths split by sex and state. We can consider sex and state as two very simple hierarchies with only one level each. Then we wish to forecast the aggregates of all combinations of the two hierarchies. Any number of separate hierarchies can be combined in this way. Non-hierarchical factors such as sex can be treated as single-level hierarchies.

## European talks. June-July 2014

For the next month I am travelling in Europe and will be giving the following talks. 17 June. Challenges in forecasting peak electricity demand. Energy Forum, Sierre, Valais/Wallis, Switzerland. 20 June. Common functional principal component models for mortality forecasting. International Workshop on Functional and Operatorial Statistics. Stresa, Italy. 24–25 June. Functional time series with applications in demography. Humboldt University, Berlin. 1 July. Fast computation of reconciled forecasts in hierarchical and grouped time series. International Symposium on Forecasting, Rotterdam, Netherlands.

## ARIMA models with long lags

Today’s email question: I work within a government budget office and sometimes have to forecast fairly simple time series several quarters into the future. Auto.arima() works great and I often get something along the lines of: ARIMA(0,0,1)(1,1,0)[12] with drift as the lowest AICc. However, my boss (who does not use R) takes issue with low-order AR and MA because “you’re essentially using forecasted data to make your forecast.” His models include AR(10) MA(12)s etc. rather frequently. I argue that’s overfitting. I don’t see a great deal of discussion in textbooks about this, and I’ve never seen such higher-order models in a textbook setting. But are they fairly common in practice? What concerns could I raise with him about higher-order models? Any advice you could give would be appreciated.

## New jobs in business analytics at Monash

We have an exciting new initiative at Monash University with some new positions in business analytics. This is part of a plan to strengthen our research and teaching in the data science/computational statistics area. We are hoping to make multiple appointments, at junior and senior levels. These are five-year appointments, but we hope that the positions will continue after that if we can secure suitable funding.

## Publishing an R package in the Journal of Statistical Software

I’ve been an editor of JSS for the last few years, and as a result I tend to get email from people asking me about publishing papers describing R packages in JSS. So for all those wondering, here are some general comments.

## Seven forecasting blogs

There are several other blogs on forecasting that readers might be interested in. Here are seven worth following: No Hesitations by Francis Diebold (Professor of Economics, University of Pennsylvania). Diebold needs no introduction to forecasters. He primarily covers forecasting in economics and finance, but also xkcd cartoons, graphics, research issues, etc. Econometrics Beat by Dave Giles. Dave is a professor of economics at the University of Victoria (Canada), formerly from my own department at Monash University (Australia), and a native New Zealander. Not a lot on forecasting, but plenty of interesting posts about econometrics and statistics more generally. Business forecasting by Clive Jones (a professional forecaster based in Colorado, USA). Originally about sales and new product forecasting, but he now covers a lot of other forecasting topics and has an interesting practitioner perspective. Freakonometrics: by Arthur Charpentier (an actuary and professor of mathematics at the University of Quebec at Montréal, Canada). This is the most prolific blog on this list. Wide ranging and taking in statistics, forecasting, econometrics, actuarial science, R, and anything else that takes his fancy. Sometimes in French. No free hunch: the kaggle blog. Some of the most interesting posts are from kaggle competition winners explaining their methods. Energy forecasting by Tao Hong (formerly an energy forecaster for