A new candidate for worst figure

Today I read a paper that had been sub­mit­ted to the IJF which included the fol­low­ing figure


along with sev­eral sim­i­lar plots. (Click for a larger ver­sion.) I haven’t seen any­thing this bad for a long time. In fact, I think I would find it very dif­fi­cult to repro­duce using R, or even Excel (which is par­tic­u­larly adept at bad graphics).

A few years ago I pro­duced “Twenty rules for good graph­ics”. I think I need to add a cou­ple of addi­tional rules:

  • Rep­re­sent time changes using lines.
  • Never use fill pat­terns such as cross-​​hatching.

(My orig­i­nal rule #20 said Avoid pie charts.)

It would have been rel­a­tively sim­ple to show these data as six lines on a plot of GDP against time. That would have made it obvi­ous that the Euro­pean GDP was shrink­ing, the GDP of Asia/​Oceania was increas­ing, while other regions of the world were fairly sta­ble. At least I think that is what is hap­pen­ing, but it is very hard to tell from such graph­i­cal obfuscation.

Visit of Di Cook

Next week, Pro­fes­sor Di Cook from Iowa State Uni­ver­sity is vis­it­ing my research group at Monash Uni­ver­sity. Di is a world leader in data visu­al­iza­tion, and is espe­cially well-​​known for her work on inter­ac­tive graph­ics and the XGobi and GGobi soft­ware. See her book with Deb Swayne for details.

For those want­ing to hear her speak, read on. Con­tinue reading →

Reflections on UseR! 2013

This week I’ve been at the R Users con­fer­ence in Albacete, Spain. These con­fer­ences are a lit­tle unusual in that they are not really about research, unlike most con­fer­ences I attend. They pro­vide a place for peo­ple to dis­cuss and exchange ideas on how R can be used.

Here are some thoughts and high­lights of the con­fer­ence, in no par­tic­u­lar order. Con­tinue reading →

The Young Stats Communication Challenge

The Aus­tralian Young Sta­tis­ti­cians Con­fer­ence (Feb 2013) is orga­niz­ing a com­mu­ni­ca­tion com­pe­ti­tion. They invite all early-​​career sta­tis­ti­cians (study­ing, or within 5 years of grad­u­a­tion) to pro­duce a short (3−5 minute) video for the ABS YSC2013 Video Com­pe­ti­tion, or a sta­tic info­graphic for the ABS YSC2013 Info­graphic Competition.

Both com­pe­ti­tions have a 1st prize of $500, and 2nd prize of $250.

Entries close 16th Novem­ber, and win­ners will be noti­fied by mid-​​December.

Details avail­able at: ysc2013​.com/​p​r​o​g​r​a​m​/​c​o​m​p​e​t​i​t​ions/

I’m a speaker at the con­fer­ence, so hope­fully I will get to see some of the great entries!


Data visualization

For those who have not read the sem­i­nal works of Tufte and Cleve­land, please hang your heads in shame. To sal­vage some sense of self-​​worth, you can then head over to Solomon Messing’s blog where he is start­ing a series on data visu­al­iza­tion based on the prin­ci­ples devel­oped by Tufte and Cleve­land (with R examples).

The clas­sics are also worth read­ing, and remain rel­e­vant despite the 20 or 30 years that have elapsed since they appeared.

Data visualization videos

Prob­a­bly every­one has seen Hans Rosling’s famous TED talk by now. If not, here it is:

I recently came across a cou­ple of other excep­tional talks on data visualization:

Hans Rosling again: “Let my dataset change your mind­set”. If only all sta­tis­tics lec­tur­ers were this dynamic!

David McCan­d­less: “The beauty of data visu­al­iza­tion”. Not so excit­ing as Hans, but some great exam­ples.

And here’s an hour-​​length doc­u­men­tary hosted by Hans Rosling called “The Joy of Stats”.

Twenty rules for good graphics

One of the things I repeat­edly include in ref­eree reports, and in my responses to authors who have sub­mit­ted papers to the Inter­na­tional Jour­nal of Fore­cast­ing, are com­ments designed to include the qual­ity of the graph­ics. Recently some­one asked on stats​.stack​ex​change​.com about best prac­tices for pro­duc­ing plots. So I thought it might be help­ful to col­late some of the answers given there and add a few com­ments of my own taken from things I’ve writ­ten for authors.

The fol­low­ing “rules” are in no par­tic­u­lar order.

  1. Use vec­tor graph­ics such as eps or pdf. These scale prop­erly and do not look fuzzy when enlarged. Do not use jpeg, bmp or png files as these will look fuzzy when enlarged, or if saved at very high res­o­lu­tions will be enor­mous files. Jpegs in par­tic­u­lar are designed for pho­tographs not sta­tis­ti­cal graphics.
  2. Use read­able fonts. For graph­ics I pre­fer sans-​​serif fonts such as Hel­vetica or Arial. Make sure the font size is read­able after the fig­ure is scaled to what­ever size it will be printed.
  3. Avoid clut­tered leg­ends. Where pos­si­ble, add labels directly to the ele­ments of the plot rather than use a leg­end at all. If this won’t work, then keep the leg­end from obscur­ing the plot­ted data, and make it small and neat.
  4. If you must use a leg­end, move it inside the plot, in a blank area.
  5. No dark shaded back­grounds. Excel always adds a nasty dark gray back­ground by default, and I’m always ask­ing authors to remove it. Graph­ics print much bet­ter with a white back­ground. The ggplot for R also uses a gray back­ground (although it is lighter than the Excel default). I don’t mind the ggplot ver­sion so much as it is used effec­tively with white grid lines. Nev­er­the­less, even the light gray back­ground doesn’t lend itself to printing/​photocopying. White is better.
  6. Avoid dark, dom­i­nat­ing grid lines (such as those pro­duced in Excel by default). Grid lines can be use­ful, but they should be in the back­ground (light gray on white or white on light gray).
  7. Keep the axis lim­its sen­si­ble. You don’t have to include a zero (even if Excel wants you to). The defaults in R work well. The basic idea is to avoid lots of white space around the plot­ted data.
  8. Make sure the axes are scaled prop­erly. Another Excel prob­lem is that the hor­i­zon­tal axis is some­times treated cat­e­gor­i­cally instead of numer­i­cally. If you are plot­ting a con­tin­u­ous numer­i­cal vari­able, then the hor­i­zon­tal axis should be prop­erly scaled for the numer­i­cal variable.
  9. Do not for­get to spec­ify units.
  10. Tick inter­vals should be at nice round numbers.
  11. Axes should be prop­erly labelled.
  12. Use linewidths big enough to read. 1pt lines tend to dis­ap­pear if plots are shrunk.
  13. Avoid over­lap­ping text on plot­ting char­ac­ters or lines.
  14. Fol­low Tufte’s prin­ci­ples by remov­ing chart junk and keep­ing a high data-​​ink ratio.
  15. Plots should be self-​​explanatory, so include detailed captions.
  16. Use a sen­si­ble aspect ratio. I think width:height of about 1.6 works well for most plots.
  17. Pre­pare graph­ics in the final aspect ratio to be used in the pub­li­ca­tion. Dis­torted fonts look awful.
  18. Use points not lines if ele­ment order is not relevant.
  19. When prepar­ing plots that are meant to be com­pared, use the same scale for all of them. Even bet­ter, com­bine plots into a sin­gle graph if they are related.
  20. Avoid pie-​​charts. Espe­cially 3d pie-​​charts. Espe­cially 3d pie-​​charts with explod­ing wedges. I promise all my stu­dents an instant fail if I ever see any­thing so appalling.

The clas­sic books on graph­ics are:

These are both highly recommended.