Twenty rules for good graphics

One of the things I repeat­edly include in ref­eree reports, and in my responses to authors who have sub­mit­ted papers to the Inter­na­tional Jour­nal of Fore­cast­ing, are com­ments designed to include the qual­ity of the graph­ics. Recently some­one asked on stats​.stack​ex​change​.com about best prac­tices for pro­duc­ing plots. So I thought it might be help­ful to col­late some of the answers given there and add a few com­ments of my own taken from things I’ve writ­ten for authors.

The fol­low­ing “rules” are in no par­tic­u­lar order.

  1. Use vec­tor graph­ics such as eps or pdf. These scale prop­erly and do not look fuzzy when enlarged. Do not use jpeg, bmp or png files as these will look fuzzy when enlarged, or if saved at very high res­o­lu­tions will be enor­mous files. Jpegs in par­tic­u­lar are designed for pho­tographs not sta­tis­ti­cal graphics.
  2. Use read­able fonts. For graph­ics I pre­fer sans-​​serif fonts such as Hel­vetica or Arial. Make sure the font size is read­able after the fig­ure is scaled to what­ever size it will be printed.
  3. Avoid clut­tered leg­ends. Where pos­si­ble, add labels directly to the ele­ments of the plot rather than use a leg­end at all. If this won’t work, then keep the leg­end from obscur­ing the plot­ted data, and make it small and neat.
  4. If you must use a leg­end, move it inside the plot, in a blank area.
  5. No dark shaded back­grounds. Excel always adds a nasty dark gray back­ground by default, and I’m always ask­ing authors to remove it. Graph­ics print much bet­ter with a white back­ground. The ggplot for R also uses a gray back­ground (although it is lighter than the Excel default). I don’t mind the ggplot ver­sion so much as it is used effec­tively with white grid lines. Nev­er­the­less, even the light gray back­ground doesn’t lend itself to printing/​photocopying. White is better.
  6. Avoid dark, dom­i­nat­ing grid lines (such as those pro­duced in Excel by default). Grid lines can be use­ful, but they should be in the back­ground (light gray on white or white on light gray).
  7. Keep the axis lim­its sen­si­ble. You don’t have to include a zero (even if Excel wants you to). The defaults in R work well. The basic idea is to avoid lots of white space around the plot­ted data.
  8. Make sure the axes are scaled prop­erly. Another Excel prob­lem is that the hor­i­zon­tal axis is some­times treated cat­e­gor­i­cally instead of numer­i­cally. If you are plot­ting a con­tin­u­ous numer­i­cal vari­able, then the hor­i­zon­tal axis should be prop­erly scaled for the numer­i­cal variable.
  9. Do not for­get to spec­ify units.
  10. Tick inter­vals should be at nice round numbers.
  11. Axes should be prop­erly labelled.
  12. Use linewidths big enough to read. 1pt lines tend to dis­ap­pear if plots are shrunk.
  13. Avoid over­lap­ping text on plot­ting char­ac­ters or lines.
  14. Fol­low Tufte’s prin­ci­ples by remov­ing chart junk and keep­ing a high data-​​ink ratio.
  15. Plots should be self-​​explanatory, so include detailed captions.
  16. Use a sen­si­ble aspect ratio. I think width:height of about 1.6 works well for most plots.
  17. Pre­pare graph­ics in the final aspect ratio to be used in the pub­li­ca­tion. Dis­torted fonts look awful.
  18. Use points not lines if ele­ment order is not relevant.
  19. When prepar­ing plots that are meant to be com­pared, use the same scale for all of them. Even bet­ter, com­bine plots into a sin­gle graph if they are related.
  20. Avoid pie-​​charts. Espe­cially 3d pie-​​charts. Espe­cially 3d pie-​​charts with explod­ing wedges. I promise all my stu­dents an instant fail if I ever see any­thing so appalling.

The clas­sic books on graph­ics are:

These are both highly rec­om­mended.

  • Steve P

    Thanks for the tips. I was already doing most of these things, but it’s good to know that you agree.

  • Petr

    rule #1– vec­tor graph­ics is good only if you do not work with extremely large num­ber of data points. A fig­ure with mil­lions of lines can be eas­ily as large as hun­dreds of Mb when ps or pdf for­mat is used. Also ren­der­ing of such image can take a long time Use of png IMO is a good choice then.

    • Rob J Hyndman

      Hi Petr. Thanks for your com­ment. I agree — I also use png when I have more than about 10000 points or lines.

  • Ben Bolker

    * Make as much text hor­i­zon­tal as pos­si­ble (par(las=1), and pos­si­bly
    swap­ping x– and y-​​axes so that long labels sit along the y axis where
    they can be spelled out hor­i­zon­tally and not overlap)

    * elim­i­nate redun­dant sets of axis labels for small mul­ti­ple plots (à la lat­tice and
    ggplot) if possible

    * the ‘clas­sic books’ aren’t show­ing up for me. Tufte and Cleveland?

    • Rob J Hyndman

      The books will be invis­i­ble if you have an adblocker. Yes, Tufte and Cleve­land are the clas­sics (imo).

  • Car­los

    I don’t under­stand # 15. Should it be some­thing like “Plots should be self-​​explanatory, so avoid exces­sively detailed captions”?

    There should also be rule # 0: Have some­thing to say. Too many charts are pointless.

    • Rob J Hyndman

      My point in #15 is to avoid the sit­u­a­tion where you have to search all the sur­round­ing text try­ing to fig­ure out what the plot means, or what the vari­ables are. This should be con­tained in the plot itself. If the plot is suf­fi­ciently self-​​explanatory that it doesn’t need much of a cap­tion, that’s great. But some­times expla­na­tion is required, and then it is bet­ter to put it in the cap­tion rather than require the reader to find the infor­ma­tion some­where else.

      I agree with your rule #0!

  • Naomi B Robbins

    Some of your advice con­tra­dicts that of Cleve­land. For exam­ple, Cleve­land says,“Avoid putting notes and keys inside the scale line rec­tan­gle.”  Cre­at­ing More Effec­tive Graphs includes an exam­ple where the data and the key are dif­fi­cult to dis­tin­guish since the key is inside the plot area.

    • Rob J Hyndman

      If there is no room for a key inside the region with­out caus­ing con­fu­sion, then I wouldn’t do it. But often (at least in my plots) there are blank regions near one or more cor­ners where a key can eas­ily fit with­out obscur­ing any­thing.  Then I think it is neater to put the key inside. It is hard to have a con­sis­tent rule here. Per­haps we need a meta-​​rule that says don’t always fol­low the rules!

  • neil­fws

    R/​ggplot2 uses a grey back­ground by default but it is sim­ple to remove. Eas­i­est way: + theme_​bw()

  • Gaz

    You instantly fail your stu­dents if they use a chart type you don’t hap­pen to like? I bet they think you’re an absolute jerk.

    • Rob J Hyndman

      My stu­dents rec­og­nize the sar­casm, even if my read­ers do not.

  • molecule61

    Regard­ing rule #7: You need to have a bet­ter rea­son not to include zero on your y axis than sim­ply avoid­ing “exces­sive” white­space. When you’re plot­ting data whose absolute value is mean­ing­ful, that white­space may be part of the value.

  • Jolicharts​.com

    Thanks for mak­ing quick tips. That will surely help peo­ple to graph a lot better.

  • Jay Jang

    Thanks for the tips. I’ve been work­ing in a finan­cial secu­ri­ties firm for about 10 months now, where I started my first full-​​time job as a fresh grad­u­ate after the 2-​​year Korean mil­i­tary ser­vice that I had after my grad­u­a­tion in Australia.

    I am in a finan­cial prod­uct plan­ning team at a finan­cial prod­uct strat­egy depart­ment within the firm. I occa­sion­ally write up some reports and they some­times include few pie-​​charts and also the 3D pie-​​charts that you said not to use in the rule #20.
    Due to lack of knowl­edge and expe­ri­ence in the field, may I ask you the rea­sons not to use the pie-​​charts, sir?

    • Rob J Hyndman

      Any book on sta­tis­ti­cal graph­ics should dis­cuss this. Try authors such as Bill Cleve­land, Stephen Few, Naomi Robbins, …

  • Joe Lotz

    This post is an oldey but goody. I still ref­er­ence peo­ple to read it!