CrossValidated launched!


5 November 2010


The CrossValidated Q&A site is now out of beta and the new design and site name is live.

New design

The new design looks great, thanks to Jin Yang, our designer-in-residence. Note the normal density icon for accepted answers and the site icon depicting a 5-fold cross-validation (light green for the test set and dark green for the training set). There is a faint background graphic in the header and footer from a program that tracks and plots a person’s mouse movement. This gives the suggestion of randomness as well as the idea of data visualization (another topic covered on the site).

Name and URL

The URL will work, but re-directs to The StackExchange team (who host the site and provide all the architecture) wanted the site to be a subdomain of However, at least we got the name CrossValidated.


The site is intended for use by statisticians, data miners, and anyone else doing data analysis. It covers questions about

  • statistical analysis

  • data mining and machine learning

  • data visualization

  • probability theory

  • statistical and data-driven computing (e.g., questions about R, SAS, SPSS, Stata and Minitab)

The inclusion of data mining and machine learning along with statistics and probability was a deliberate attempt to get these two communities to talk. We work on similar problems, but often with different tools and different perspectives. I hope the site comes to be widely used within both communities. In fact, I hope that we can eventually stop talking about two communities and just refer to the “data science community”.

My original idea was that this would be helpful to researchers struggling with data analysis issues but have no statistician to ask for help. University-based statisticians are often inundated with requests for help from researchers in other disciplines who have no quantitative training but need to do apply some statistical techniques.


For those who haven’t been reading this blog, I proposed this site on 15 April 2010. The scope of the site was determined via a community process, then we went through a phase of building a sufficient community. The beta site was launched on 19 July 2010 with the first question on “Eliciting priors from experts”.

The site was officially launched today (5 November 2010). So it took just over 200 days from proposal to launch – I had no idea what I was starting, but I’m glad it worked out! There are now 1048 questions and 1763 users which is a great start. But there must be hundreds of thousands of people doing data analysis and who would really benefit from a site like this. So please spread the word about