Sample quantiles 20 years later

Date

28 March 2016

Topics
computing
R
statistics

Almost exactly 20 years ago I wrote a paper with Yanan Fan on how sample quantiles are computed in statistical software. It was cited 43 times in the first 10 years, and 457 times in the next 10 years, making it my third paper to receive 500+ citations.

So what happened in 2006 to suddenly increase the citations? I think it was a combination of things:

The main point of our paper was that statistical software should standardize the definition of a sample quantile for consistency. We listed 9 different methods that we found in various software packages, and argued for one of them (type 8). In that sense, the paper was a complete failure. No major software uses type 8 by default, and the diversity of definitions continues 20 years later. In fact, the paper may have had the opposite effect to what was intended. We drew attention to the many approaches to computing sample quantiles and several software products added them all as options. Our own quantile function for R allows all 9 to be computed, and has type 7 as default (for backwards consistency – the price we had to pay to get R core to agree to include our function).

The story of this paper provides some interesting lessons: