Today I read a paper that had been submitted to the IJF which included the following figure along with several similar plots. (Click for a larger version.) I haven’t seen anything this bad for a long time. In fact, I think I would find it very difficult to reproduce using R, or even Excel (which is particularly adept at bad graphics). A few years ago I produced “Twenty rules for good graphics”. I think I need to add a couple of additional rules: Represent time changes using lines. Never use fill patterns such as cross-hatching. (My original rule #20 said Avoid pie charts.) It would have been relatively simple to show these data as six lines on a plot of GDP against time. That would have made it obvious that the European GDP was shrinking, the GDP of Asia/Oceania was increasing, while other regions of the world were fairly stable. At least I think that is what is happening, but it is very hard to tell from such graphical obfuscation.

## Posts Tagged ‘R’:

## Forecasting with R in WA

On 23–25 September, I will be running a 3-day workshop in Perth on “Forecasting: principles and practice” mostly based on my book of the same name. Workshop participants will be assumed to be familiar with basic statistical tools such as multiple regression, but no knowledge of time series or forecasting will be assumed. Some prior experience in R is highly desirable. Venue: The University Club, University of Western Australia, Nedlands WA. Day 1: Forecasting tools, seasonality and trends, exponential smoothing. Day 2: State space models, stationarity, transformations, differencing, ARIMA models. Day 3: Time series cross-validation, dynamic regression, hierarchical forecasting, nonlinear models. The course will involve a mixture of lectures and practical sessions using R. Each participant must bring their own laptop with R installed, along with the fpp package and its dependencies. For costs and enrolment details, go to http://www.cas.maths.uwa.edu.au/courses/forecasting.

## GEFCom 2014 energy forecasting competition is underway

GEFCom 2014 is the most advanced energy forecasting competition ever organized, both in terms of the data involved, and in terms of the way the forecasts will be evaluated. So everyone interested in energy forecasting should head over to the competition webpage and start forecasting: www.gefcom.org. This time, the competition is hosted on CrowdANALYTIX rather than Kaggle. Highlights of GEFCom2014: An upgraded edition from GEFCom2012 Four tracks: electric load, electricity price, wind power and solar power forecasting. Probabilistic forecasting: contestants are required to submit 99 quantiles for each step throughout the forecast horizon. Rolling forecasting: incremental data sets are being released on weekly basis to forecast the next period of interest. Prizes for winning teams and institutions: up to 3 teams from each track will be recognized as the winning team; top institutions with multiple well-performing teams will be recognized as the winning institutions. Global participation: 200+ people from 40+ countries have already signed up the GEFCom2014 interest list. Tao Hong (the main organizer) has a few tips on his blog that you should read before starting.

## Visit of Di Cook

Next week, Professor Di Cook from Iowa State University is visiting my research group at Monash University. Di is a world leader in data visualization, and is especially well-known for her work on interactive graphics and the XGobi and GGobi software. See her book with Deb Swayne for details. For those wanting to hear her speak, read on.

## Minimal reproducible examples

I occasionally get emails from people thinking they have found a bug in one of my R packages, and I usually have to reply asking them to provide a minimal reproducible example (MRE). This post is to provide instructions on how to create a MRE.

## Coherent population forecasting using R

This is an example of how to use the demography package in R for stochastic population forecasting with coherent components. It is based on the papers by Hyndman and Booth (IJF 2008) and Hyndman, Booth and Yasmeen (Demography 2013). I will use Australian data from 1950 to 2009 and forecast the next 50 years. In demography, “coherent” forecasts are where male and females (or other sub-groups) do not diverge over time. (Essentially, we require the difference between the groups to be stationary.) When we wrote the 2008 paper, we did not know how to constrain the forecasts to be coherent in a functional data context and so this was not discussed. My later 2013 paper provided a way of imposing coherence. This blog post shows how to implement both ideas using R.

## Plotting the characteristic roots for ARIMA models

When modelling data with ARIMA models, it is sometimes useful to plot the inverse characteristic roots. The following functions will compute and plot the inverse roots for any fitted ARIMA model (including seasonal models).

## Variations on rolling forecasts

Rolling forecasts are commonly used to compare time series models. Here are a few of the ways they can be computed using R. I will use ARIMA models as a vehicle of illustration, but the code can easily be adapted to other univariate time series models.

## Varian on big data

Last week my research group discussed Hal Varian’s interesting new paper on “Big data: new tricks for econometrics”, Journal of Economic Perspectives, 28(2): 3–28. It’s a nice introduction to trees, bagging and forests, plus a very brief entrée to the LASSO and the elastic net, and to slab and spike regression. Not enough to be able to use them, but ok if you’ve no idea what they are.

## Specifying complicated groups of time series in hts

With the latest version of the hts package for R, it is now possible to specify rather complicated grouping structures relatively easily. All aggregation structures can be represented as hierarchies or as cross-products of hierarchies. For example, a hierarchical time series may be based on geography: country, state, region, store. Often there is also a separate product hierarchy: product groups, product types, packet size. Forecasts of all the different types of aggregation are required; e.g., product type A within region X. The aggregation structure is a cross-product of the two hierarchies. This framework includes even apparently non-hierarchical data: consider the simple case of a time series of deaths split by sex and state. We can consider sex and state as two very simple hierarchies with only one level each. Then we wish to forecast the aggregates of all combinations of the two hierarchies. Any number of separate hierarchies can be combined in this way. Non-hierarchical factors such as sex can be treated as single-level hierarchies.