This is an example of how to use the demography package in R for stochastic population forecasting with coherent components. It is based on the papers by Hyndman and Booth (IJF 2008) and Hyndman, Booth and Yasmeen (Demography 2013). I will use Australian data from 1950 to 2009 and forecast the next 50 years. In demography, “coherent” forecasts are where male and females (or other sub-groups) do not diverge over time. (Essentially, we require the difference between the groups to be stationary.) When we wrote the 2008 paper, we did not know how to constrain the forecasts to be coherent in a functional data context and so this was not discussed. My later 2013 paper provided a way of imposing coherence. This blog post shows how to implement both ideas using R.
Posts Tagged ‘statistics’:
When modelling data with ARIMA models, it is sometimes useful to plot the inverse characteristic roots. The following functions will compute and plot the inverse roots for any fitted ARIMA model (including seasonal models).
I am a statistician, but I have worked in a department of predominantly econometricians for the past 17 years. It is a little like an Australian visiting the United States. Initially, it seems that we talk the same language, do the same sorts of things, and have a very similar culture. But the longer you stay there, the more you realise there are differences that run deep and affect the way you see the world. Last week at my research group meeting, I spoke about some of the differences I have noticed. Coincidentally, Andrew Gelman blogged about the same issue a day later.
Rolling forecasts are commonly used to compare time series models. Here are a few of the ways they can be computed using R. I will use ARIMA models as a vehicle of illustration, but the code can easily be adapted to other univariate time series models.
Last week my research group discussed Hal Varian’s interesting new paper on “Big data: new tricks for econometrics”, Journal of Economic Perspectives, 28(2): 3–28. It’s a nice introduction to trees, bagging and forests, plus a very brief entrée to the LASSO and the elastic net, and to slab and spike regression. Not enough to be able to use them, but ok if you’ve no idea what they are.
With the latest version of the hts package for R, it is now possible to specify rather complicated grouping structures relatively easily. All aggregation structures can be represented as hierarchies or as cross-products of hierarchies. For example, a hierarchical time series may be based on geography: country, state, region, store. Often there is also a separate product hierarchy: product groups, product types, packet size. Forecasts of all the different types of aggregation are required; e.g., product type A within region X. The aggregation structure is a cross-product of the two hierarchies. This framework includes even apparently non-hierarchical data: consider the simple case of a time series of deaths split by sex and state. We can consider sex and state as two very simple hierarchies with only one level each. Then we wish to forecast the aggregates of all combinations of the two hierarchies. Any number of separate hierarchies can be combined in this way. Non-hierarchical factors such as sex can be treated as single-level hierarchies.
For the next month I am travelling in Europe and will be giving the following talks. 17 June. Challenges in forecasting peak electricity demand. Energy Forum, Sierre, Valais/Wallis, Switzerland. 20 June. Common functional principal component models for mortality forecasting. International Workshop on Functional and Operatorial Statistics. Stresa, Italy. 24–25 June. Functional time series with applications in demography. Humboldt University, Berlin. 1 July. Fast computation of reconciled forecasts in hierarchical and grouped time series. International Symposium on Forecasting, Rotterdam, Netherlands.
Some new websites are being established offering “market places” for data science. Two I’ve come across recently are Experfy and SnapAnalytx.
I’m tired of reading about tests for structural breaks and here’s why. A structural break occurs when we see a sudden change in a time series or a relationship between two time series. Econometricians love papers on structural breaks, and apparently believe in them. Personally, I tend to take a different view of the world. I think a more realistic view is that most things change slowly over time, and only occasionally with sudden discontinuous change.
Last week, my research group discussed Galit Shmueli’s paper “To explain or to predict?”, Statistical Science, 25(3), 289–310. (See her website for further materials.) This is a paper everyone doing statistics and econometrics should read as it helps to clarify a distinction that is often blurred. In the discussion, the following issues were covered amongst other things. The AIC is better suited to model selection for prediction as it is asymptotically equivalent to leave-one-out cross-validation in regression, or one-step-cross-validation in time series. On the other hand, it might be argued that the BIC is better suited to model selection for explanation, as it is consistent. P-values are associated with explanation, not prediction. It makes little sense to use p-values to determine the variables in a model that is being used for prediction. (There are problems in using p-values for variable selection in any context, but that is a different issue.) Multicollinearity has a very different impact if your goal is prediction from when your goal is estimation. When predicting, multicollinearity is not really a problem provided the values of your predictors lie within the hyper-region of the predictors used when estimating the model. An ARIMA model has no explanatory use, but is great at short-term prediction. How to