## Do document reviewers need legal training?

July 15th, 2012

In my last post, I discussed an experiment in which we had two assessors re-assess TREC Legal documents with less and more detailed guidelines, and found that the more detailed guidelines did not make the assessors more reliable. Another natural question to ask of these results, though not one the experiment was directly designed to answer, is how well our assessors compared with the first-pass assessors employed for TREC, who for this particular topic (Topic 204 from the 2009 Interactive task) happened to be a review team from a vendor of professional legal review services. How well do our non-professional assessors compare to the professionals?
Read the rest of this entry »

## Detailed guidelines don't help assessors

July 2nd, 2012

Social scientists are often accused of running studies that confirm the obvious, such as that people are happier on the weekends, or that having many meetings at work make employees feel fatigued. The best response is, what seems obvious may not actually be true. That, indeed, is what we found in a recent experiment. We set out to confirm that giving relevance assessors more detailed guidelines would make them more reliable. We found it didn't.
Read the rest of this entry »

## "Approximate Recall Confidence Intervals", updated and in submission

May 18th, 2012

Much later than I intended, after painstaking editing to get the length down from 39 to 31 pages, I've prepared a revised version of "Approximate Recall Confidence Intervals", which is now in submission. Aside from tightening up the text and excluding a few inessential results, the main change from the first version has been to force interval upper edges to 1 where no relevant documents are found in the unretrieved sample, and to 0 where none are found in the retrieved sample. I've also released recci, an R package for computing recall confidence intervals, along with other R packages for generating figures and tables and re-running the experiments reported in the paper.

## Recall confidence intervals

February 25th, 2012

Frequent readers of this blog will know of my burning desire to move IR research away from dry technical topics and towards questions that directly impact and excite the retrieval user. In pursuit of this goal, I have for the past year been working on a paper on estimating two-tailed confidence intervals for recall under simple and stratified random sampling of assessments. I posted a pre-print of this article, Approximate Recall Confidence Intervals, to arXiv about a week ago.

Sampling distribution of recall

Read the rest of this entry »

## Attention-enhancing information retrieval

February 19th, 2012

Last week I was at SWIRL, the occasional talkshop on the future of information retrieval. To me the most important of the presentations was Diane Kelly's "Rage against the Machine Learning", in which she observed the way information retrieval currently works has changed the way people think. In particular, she proposed that the combination of short query with snippet response has reworked peoples' plastic brains to focus on working memory, and forgo the processing of information required for it to lay its tracks down in our long term memory. In short, it makes us transactionally adept, but stops us from learning.
Read the rest of this entry »

## How accurate can manual review be?

December 18th, 2011

One of the chief pleasures for me of this year's SIGIR in Beijing was attending the SIGIR 2011 Information Retrieval for E-Discovery Workshop (SIRE 2011). The smaller and more selective the workshop, it often seems, the more focused and interesting the discussion.

## Assessor disagreement and court sanctions

September 4th, 2011

I mentioned Cross and Kerksiek's suggestion of vocabulary discovery in my previous post. Their paper also contains an interesting reference to a case (Felman Products, Inc. v. Industrial Risk Insurers) in which the defendant was penalized for the carelessness of their production. The defendant inadvertently produced privileged documents, and sought to have them ruled inadmissable. Two judges, the original and an appellate, ruled against the defendant, on the grounds that the defendant had not shown sufficient care in their production.
Read the rest of this entry »

## Corpus characterization in e-discovery

September 4th, 2011

In e-discovery (document retrieval for civil litigation), one side has the documents, the other side proposes the query. This creates an information asymmetry; the requesting side cannot view the corpus to decide what keywords to use and what queries to propose, and opportunities for query iteration are limited, expensive, and liable to being contested.
Read the rest of this entry »

July 21st, 2011

Correct spelling and grammar more important than positivity or negativity of product reviews -- Panos Ipeirotis.

Fitting an elephant with four parameters.

Placebos as effective as real medicine in improving subjectively-measured asthma symptoms, but ineffective in improving objectively-measured symptoms -- Science-based medicine.

The Spread of Evidence-Poor Medicine via Flawed Social-Network Analysis. You don't need to worry about friends catching your obesity---but you might need to worry (even more) about being subjected to interventions based upon poor statistics and faulty peer reviewing.

## Multiple significance tests in IR

July 15th, 2011

At the most recent TIGER reading group, Mark Sanderson presented Bland and Altman's introduction to multiple significance tests and the Bonferroni method. The basic point is simple: if you keep trying different experiments, and testing each for significance, then eventually you will find significance by chance, even where no real effect exists. Therefore, if you are performing multiple significance tests, you need to adjust your $p$ values up.
Read the rest of this entry »