Comments on: Confidence intervals on recall and eRecall William Webber's E-Discovery Consulting Blog Wed, 07 Jan 2015 23:18:11 +0000 hourly 1 By: gvc Wed, 07 Jan 2015 23:18:11 +0000 Yes, definition 2 is incorrect, but I believe it is the one Roitblat currently uses.

By: william Wed, 07 Jan 2015 01:20:53 +0000 Gord,

Hi! The formula I used was Definition 1. Definition 2 is incorrect, isn't it? (Or at least "elusion" and "prevalence" are being used loosely to mean "discard yield" and "collection yield".)


By: gvc Mon, 05 Jan 2015 21:50:11 +0000 Roitblat gives two inconsistent definitions for eRecall. I'm wondering which you used.

Definition 1. In his earlier work (cited above) he uses prevalence to estimate TP+FN (the total number of relevant documents) and he uses elusion to estimate FN (the total number of missed relevant documents. He then plugs these estimates into a contingency table with N (the total number of documents) and D (the total number of discarded documents). If I am not mistaken, the resulting formula is

eRecall = 1 - (elusion/prevalence * D/N)

Definition 2. In his most recent work (, he defines

eRecall = 1 - elusion/prevalence