Comments on: Quality assurance samples and prior beliefs http://blog.codalism.com/index.php/quality-assurance-samples-and-prior-beliefs/ William Webber's E-Discovery Consulting Blog Sat, 11 Aug 2012 10:50:07 +0000 hourly 1 https://wordpress.org/?v=6.1.1 By: gvc http://blog.codalism.com/index.php/quality-assurance-samples-and-prior-beliefs/comment-page-1/#comment-202464 Sat, 11 Aug 2012 10:50:07 +0000 http://blog.codalism.com/?p=1699#comment-202464 In an ideal world, we would always use a 1-tailed test for e-disco validation. What we really want do know is, "recall is at least X, with Z% confidence" (1-tailed) not "recall is between X and Y, with Z% confidence" (2-tailed). But to do that you must decide a priori that you are planning to use a 1-tailed test.

When I do this for real, I do use 1-tailed tests, but since Ralph used 2-tailed, I did not think it was appropriate to appear to switch horses in mid stream. And, to be honest, I did not want to introduce this factor in a legal blog.

As William points out, you get a slightly higher lower bound if you use a 1-tailed test, but the overall conclusion is the same.

]]>
By: william http://blog.codalism.com/index.php/quality-assurance-samples-and-prior-beliefs/comment-page-1/#comment-202168 Fri, 10 Aug 2012 00:08:01 +0000 http://blog.codalism.com/?p=1699#comment-202168 In reply to Ethan A.

Ethan,

Hi! Thanks for a very insightful comment.

You're right that the choice to use a one-tailed rather than a two-tailed interval should be made as part of the protocol itself, before one observes the results of the sample. However, for this sort of quality assurance work, a one-tailed interval is in principle the correct one to use. The reason is that we are trying to bound the probabilistically worst-case behavior of our production. If we couldn't state confidently that our performance was at least as good as such and such a level, then it might cause some decision to be made (to continue searching, to renegotiate the terms of the production, etc.). No such decision, however, is provoked by the probablistically best-case behaviour; we don't say, "oh dear, our production might really be much better than we want; we'd better degrade it somehow".

For the initial sample that is made at the outset of a production, however, a two-tailed confidence interval is what we want, because we do want to predict within which bounds the number of relevant documents lies, and make decisions about our forthcoming production accordingly.

]]>
By: Ethan A http://blog.codalism.com/index.php/quality-assurance-samples-and-prior-beliefs/comment-page-1/#comment-202099 Thu, 09 Aug 2012 14:11:24 +0000 http://blog.codalism.com/?p=1699#comment-202099 William,
Thanks much for emphasizing and alaborating on the distinctness of the two samples for use in gleaning cinfidence intervals.
One question about your departure from Gordon's first (presumably cursory) approach of using a 2-tailed interval - you elected to use a 1-tailed interval to more accurately characterize the upper bound we are interested in. Can you comment on how this is a valid approach and not just an on-the-fly-reaction driven by the observed measurement of zero? My limited understanding is that your approach is driven by the desire to more accurately characterize just the upper bound, but using a 1-tailed calculation wouldn't be appropriate solely because the measurement in a particular sample happened to be zero.

(Specifically I'm referring to John Pezullo's reported discussion with Karl Schlag ( at the bottom of his CI-calculations page - http://statpages.org/confint.html )

]]>