Assessor disagreement and court sanctions

I mentioned Cross and Kerksiek's suggestion of vocabulary discovery in my previous post. Their paper also contains an interesting reference to a case (Felman Products, Inc. v. Industrial Risk Insurers) in which the defendant was penalized for the carelessness of their production. The defendant inadvertently produced privileged documents, and sought to have them ruled inadmissable. Two judges, the original and an appellate, ruled against the defendant, on the grounds that the defendant had not shown sufficient care in their production.

One of the evidences that the appellate judge cited for lack of care was that at least 30% of the 1 million documents produced were either irrelevant or privileged. This argument is interesting, and rather disturbing, because 30% disagreement on relevance is quite low; the defendant could in good faith have produced 100% relevant documents by their own understanding, and have 30% found irrelevant by a different assessor.

Assessment Disagreement
RR RN NR RN/(RR + RN) NR/(RR + NR)
Original vs. Team A 238 250 971 51% 80%
Original vs. Team B 263 225 1175 46% 82%
Team A vs. Team B 580 629 858 52% 60%

For instance, in Roitblat, Kershaw, and Oot (JASIST 2010), a previous manual review was re-done by two manual review teams (as well as two automatic systems). I extract in the above table (from Table 1 of the JASIST paper) the level of agreement found. The last two columns show the percentage of documents found relevant ("produced") by one team which the other team found irrelevant. All of the six team pairs produce disagreements higher than the 30% observed in the case cited above. And these are all purely manual reviews; there is no automatic production here.

Now, the assessment of irrelevant by which 30% of the production in Felman v. IRI were found irrelevant may be something stronger than "failure independently to agree on relevance". It may be something more like "not plausibly relevant under any conditions", a condition that I have not seen characterized and analyzed in the e-discovery literature (though see Grossman and Cormack, DESI 2011, for an analysis of primary assessor disagreements with the topic authority in the TREC Legal Track). Still, I wonder whether parties in the case laboured under an exaggerated sense of the infallibility of manual production.

Leave a Reply