Today I read a paper that had been submitted to the IJF which included the following figure
along with several similar plots. (Click for a larger version.) I haven’t seen anything this bad for a long time. In fact, I think I would find it very difficult to reproduce using R, or even Excel (which is particularly adept at bad graphics).
A few years ago I produced “Twenty rules for good graphics”. I think I need to add a couple of additional rules:
- Represent time changes using lines.
- Never use fill patterns such as cross-hatching.
(My original rule #20 said Avoid pie charts.)
It would have been relatively simple to show these data as six lines on a plot of GDP against time. That would have made it obvious that the European GDP was shrinking, the GDP of Asia/Oceania was increasing, while other regions of the world were fairly stable. At least I think that is what is happening, but it is very hard to tell from such graphical obfuscation.
Next week, Professor Di Cook from Iowa State University is visiting my research group at Monash University. Di is a world leader in data visualization, and is especially well-known for her work on interactive graphics and the XGobi and GGobi software. See her book with Deb Swayne for details.
For those wanting to hear her speak, read on. Continue reading →
This week I’ve been at the R Users conference in Albacete, Spain. These conferences are a little unusual in that they are not really about research, unlike most conferences I attend. They provide a place for people to discuss and exchange ideas on how R can be used.
Here are some thoughts and highlights of the conference, in no particular order. Continue reading →
When I want to insert figures generated in R into a LaTeX document, it looks better if I first remove the white space around the figure. Unfortunately, R does not make this easy as the graphs are generated to look good on a screen, not in a document.
There are two things that can be done to fix this problem. Continue reading →
Today I was writing a report which included 20 figures, with the names
demandplot20.pdf, and all with similar captions. Clearly a loop was required. After all, LaTeX is a programming language, so we should be able to take advantage of its capabilities. Continue reading →
The Australian Young Statisticians Conference (Feb 2013) is organizing a communication competition. They invite all early-career statisticians (studying, or within 5 years of graduation) to produce a short (3−5 minute) video for the ABS YSC2013 Video Competition, or a static infographic for the ABS YSC2013 Infographic Competition.
Both competitions have a 1st prize of $500, and 2nd prize of $250.
Entries close 16th November, and winners will be notified by mid-December.
Details available at: ysc2013.com/program/competitions/
I’m a speaker at the conference, so hopefully I will get to see some of the great entries!
For those who have not read the seminal works of Tufte and Cleveland, please hang your heads in shame. To salvage some sense of self-worth, you can then head over to Solomon Messing’s blog where he is starting a series on data visualization based on the principles developed by Tufte and Cleveland (with R examples).
The classics are also worth reading, and remain relevant despite the 20 or 30 years that have elapsed since they appeared.
I like to use animated plots in my talks on functional time series, partly because it is the only way to really see what is going on with changes in the shapes of curves over time, and also because audiences love them! Here is how it is done. Continue reading →
One of the things I repeatedly include in referee reports, and in my responses to authors who have submitted papers to the International Journal of Forecasting, are comments designed to include the quality of the graphics. Recently someone asked on stats.stackexchange.com about best practices for producing plots. So I thought it might be helpful to collate some of the answers given there and add a few comments of my own taken from things I’ve written for authors.
The following “rules” are in no particular order.
- Use vector graphics such as eps or pdf. These scale properly and do not look fuzzy when enlarged. Do not use jpeg, bmp or png files as these will look fuzzy when enlarged, or if saved at very high resolutions will be enormous files. Jpegs in particular are designed for photographs not statistical graphics.
- Use readable fonts. For graphics I prefer sans-serif fonts such as Helvetica or Arial. Make sure the font size is readable after the figure is scaled to whatever size it will be printed.
- Avoid cluttered legends. Where possible, add labels directly to the elements of the plot rather than use a legend at all. If this won’t work, then keep the legend from obscuring the plotted data, and make it small and neat.
- If you must use a legend, move it inside the plot, in a blank area.
- No dark shaded backgrounds. Excel always adds a nasty dark gray background by default, and I’m always asking authors to remove it. Graphics print much better with a white background. The ggplot for R also uses a gray background (although it is lighter than the Excel default). I don’t mind the ggplot version so much as it is used effectively with white grid lines. Nevertheless, even the light gray background doesn’t lend itself to printing/photocopying. White is better.
- Avoid dark, dominating grid lines (such as those produced in Excel by default). Grid lines can be useful, but they should be in the background (light gray on white or white on light gray).
- Keep the axis limits sensible. You don’t have to include a zero (even if Excel wants you to). The defaults in R work well. The basic idea is to avoid lots of white space around the plotted data.
- Make sure the axes are scaled properly. Another Excel problem is that the horizontal axis is sometimes treated categorically instead of numerically. If you are plotting a continuous numerical variable, then the horizontal axis should be properly scaled for the numerical variable.
- Do not forget to specify units.
- Tick intervals should be at nice round numbers.
- Axes should be properly labelled.
- Use linewidths big enough to read. 1pt lines tend to disappear if plots are shrunk.
- Avoid overlapping text on plotting characters or lines.
- Follow Tufte’s principles by removing chart junk and keeping a high data-ink ratio.
- Plots should be self-explanatory, so include detailed captions.
- Use a sensible aspect ratio. I think width:height of about 1.6 works well for most plots.
- Prepare graphics in the final aspect ratio to be used in the publication. Distorted fonts look awful.
- Use points not lines if element order is not relevant.
- When preparing plots that are meant to be compared, use the same scale for all of them. Even better, combine plots into a single graph if they are related.
- Avoid pie-charts. Especially 3d pie-charts. Especially 3d pie-charts with exploding wedges. I promise all my students an instant fail if I ever see anything so appalling.
The classic books on graphics are:
These are both highly recommended.