Comments on: Quality assurance samples and prior beliefs

By: gvc

gvc — Sat, 11 Aug 2012 10:50:07 +0000

In an ideal world, we would always use a 1-tailed test for e-disco validation. What we really want do know is, "recall is at least X, with Z% confidence" (1-tailed) not "recall is between X and Y, with Z% confidence" (2-tailed). But to do that you must decide a priori that you are planning to use a 1-tailed test. When I do this for real, I do use 1-tailed tests, but since Ralph used 2-tailed, I did not think it was appropriate to appear to switch horses in mid stream. And, to be honest, I did not want to introduce this factor in a legal blog. As William points out, you get a slightly higher lower bound if you use a 1-tailed test, but the overall conclusion is the same.

By: william

william — Fri, 10 Aug 2012 00:08:01 +0000

In reply to Ethan A. Ethan, Hi! Thanks for a very insightful comment. You're right that the choice to use a one-tailed rather than a two-tailed interval should be made as part of the protocol itself, before one observes the results of the sample. However, for this sort of quality assurance work, a one-tailed interval is in principle the correct one to use. The reason is that we are trying to bound the probabilistically worst-case behavior of our production. If we couldn't state confidently that our performance was at least as good as such and such a level, then it might cause some decision to be made (to continue searching, to renegotiate the terms of the production, etc.). No such decision, however, is provoked by the probablistically best-case behaviour; we don't say, "oh dear, our production might really be much better than we want; we'd better degrade it somehow". For the initial sample that is made at the outset of a production, however, a two-tailed confidence interval is what we want, because we do want to predict within which bounds the number of relevant documents lies, and make decisions about our forthcoming production accordingly.

By: Ethan A

Ethan A — Thu, 09 Aug 2012 14:11:24 +0000

William,
Thanks much for emphasizing and alaborating on the distinctness of the two samples for use in gleaning cinfidence intervals.
One question about your departure from Gordon's first (presumably cursory) approach of using a 2-tailed interval - you elected to use a 1-tailed interval to more accurately characterize the upper bound we are interested in. Can you comment on how this is a valid approach and not just an on-the-fly-reaction driven by the observed measurement of zero? My limited understanding is that your approach is driven by the desire to more accurately characterize just the upper bound, but using a 1-tailed calculation wouldn't be appropriate solely because the measurement in a particular sample happened to be zero.

(Specifically I'm referring to John Pezullo's reported discussion with Karl Schlag ( at the bottom of his CI-calculations page - http://statpages.org/confint.html )