Ever since Judge Grimm opined that random sampling constituted prudent for checking the reliability of a production (Victor Stanley v. Creative Pipe, 269 F.R.D. 497), there has been strong interest in the topic of sampling within e-discovery, including from lawyers themselves. Ralph Losey, for interest, has devoted a post in his blog to the topic of sampling, and his recent blog posts narrating an example predictive coding exercise have contained much sampling-related material.

I've written some research work on more advanced topics in confidence intervals, but I thought it might be useful to write some more introductory material as well. I originally intended to write a series of blog posts giving a brief tutorial on sampling and estimation, but the brief tutorial worked out to be around 5,000 words, so I've made it into a separate document: A tutorial on interval estimation for a proportion, with particular reference to e-discovery. The tutorial aims to give an understanding of the workings behind confidence intervals, while avoiding as much math as possible. (If you want a still more high-level discussion of sampling, estimation, and intervals, then I recommend Venkat Rangan's post on Predictive Coding -- Measurement Challenges.) The tutorial is marked as Version 0.1; I'd be very grateful for any corrections, comments, or suggestions for improvement, and will work them in to later versions.