Yes, definition 2 is incorrect, but I believe it is the one Roitblat currently uses.

Gord,

Hi! The formula I used was Definition 1. Definition 2 is incorrect, isn't it? (Or at least "elusion" and "prevalence" are being used loosely to mean "discard yield" and "collection yield".)

William

William

Roitblat gives two inconsistent definitions for eRecall. I'm wondering which you used.

Definition 1. In his earlier work (cited above) he uses prevalence to estimate TP+FN (the total number of relevant documents) and he uses elusion to estimate FN (the total number of missed relevant documents. He then plugs these estimates into a contingency table with N (the total number of documents) and D (the total number of discarded documents). If I am not mistaken, the resulting formula is

eRecall = 1 - (elusion/prevalence * D/N)

Definition 2. In his most recent work (http://orcatec.com/2014/11/04/the-pendulum-swings-practical-measurement-in-ediscovery/), he defines

eRecall = 1 - elusion/prevalence

