Frequent readers of this blog will know of my burning desire to move IR research away from dry technical topics and towards questions that directly impact and excite the retrieval user. In pursuit of this goal, I have for the past year been working on a paper on estimating two-tailed confidence intervals for recall under simple and stratified random sampling of assessments. I posted a pre-print of this article, Approximate Recall Confidence Intervals, to arXiv about a week ago.

As the article's long gestation period suggests, estimating recall confidence intervals is more complex than it might at first seem. The sampling distribution of recall can be highly irregular (see the figure above); indeed, the natural point estimate of recall itself is a biased estimate. There are also tricky but common and therefore important extreme cases, such as samples from the unretrieved stratum that return few or no relevant documents, which simple approximate methods (such as the normal approximation) handle very poorly.

Ultimately, after trying nine methods from five (well, really six) different families, I found that the most accurate method inferred beta-binomial posteriors on each segment or stratum using Bayesian methods, using a simple, highly non-theoretical prior, then generated a Monte Carlo posterior distribution on recall. This is a satisfying conclusion not just because it avoids a lot of hairy maths (in favour of a lot of simple simulations), but also because if we want to do more complex inference, we're going to want to turn to Bayesian methods, and it is nice to know that such methods work best even in a simple, frequentist environment.

I'm trying out a graduated review process for this article. I've solicited direct comments from several reviewers, and am now placing the article into a public review phase. Comments and corrections from readers are very welcome. I'll conclude the public review on Friday, March 9th, barring any major problems, and then submit to a journal (I'm currently intending ACMTOIS).

[...] written some research work on more advanced topics in confidence intervals, but I thought it might be useful to write some [...]