Comments on: Confidence intervals on recall and eRecall http://blog.codalism.com/index.php/confidence-intervals-on-recall-and-erecall/ William Webber's E-Discovery Consulting Blog Wed, 07 Jan 2015 23:18:11 +0000 hourly 1 https://wordpress.org/?v=6.1.1 By: gvc http://blog.codalism.com/index.php/confidence-intervals-on-recall-and-erecall/comment-page-1/#comment-3591720 Wed, 07 Jan 2015 23:18:11 +0000 http://blog.codalism.com/?p=2353#comment-3591720 Yes, definition 2 is incorrect, but I believe it is the one Roitblat currently uses.

]]>
By: william http://blog.codalism.com/index.php/confidence-intervals-on-recall-and-erecall/comment-page-1/#comment-3585702 Wed, 07 Jan 2015 01:20:53 +0000 http://blog.codalism.com/?p=2353#comment-3585702 Gord,

Hi! The formula I used was Definition 1. Definition 2 is incorrect, isn't it? (Or at least "elusion" and "prevalence" are being used loosely to mean "discard yield" and "collection yield".)

William

]]>
By: gvc http://blog.codalism.com/index.php/confidence-intervals-on-recall-and-erecall/comment-page-1/#comment-3578326 Mon, 05 Jan 2015 21:50:11 +0000 http://blog.codalism.com/?p=2353#comment-3578326 Roitblat gives two inconsistent definitions for eRecall. I'm wondering which you used.

Definition 1. In his earlier work (cited above) he uses prevalence to estimate TP+FN (the total number of relevant documents) and he uses elusion to estimate FN (the total number of missed relevant documents. He then plugs these estimates into a contingency table with N (the total number of documents) and D (the total number of discarded documents). If I am not mistaken, the resulting formula is

eRecall = 1 - (elusion/prevalence * D/N)

Definition 2. In his most recent work (http://orcatec.com/2014/11/04/the-pendulum-swings-practical-measurement-in-ediscovery/), he defines

eRecall = 1 - elusion/prevalence

]]>