Runs, rankings, systems, submissions: a nomenclature

Finding suitable names for things is both tricky and important. Well, ok, not very important, but it is the sort of issue that keeps me awake at night. A naming problem that we spent a fair bit of time discussing when developing EvaluatIR is what to call the various entities one encounters when dealing with TREC-style experiments. This question has arisen for me again as I continue my thesis write-up. I'd be interested to hear what conventions others follow.

Let me describe the various entities involved. I'll provide placeholder names for them, which will be more or less what we chose for EvaluatIR; these placeholders are prepended with "$" to indicate their placeholder status. In the following discussion, "$foo" is the placeholder for the entity, whereas "foo" is the actual string, whether used for the entity of the same label or elsewhere.


The first set of names refers to the various manifestations of an information retrieval system:

An information retrieval system, such as Indri or Zettair, irrespective of version or modifications.

A particular version of an information retrieval system, whether an official release or an unofficial modification of the source code, but irrespective of any particular set of options chosen.

An information retrieval system with a particular set of option and parameter values for that system (including, for instance, the topic field chosen). A setup is what is actually run against a test collection.

The main point here is to separate out the codebase or family of a retrieval system from what actually gets used to make a run against some collection. The separations here are not entirely cut and dried: system codebases can fork, and the same change to a system's operation could in some circumstances be effected by setting an option, in others by modifying the code. These distinctions were particularly important in EvaluatIR, because we wanted to capture system lineages. They are probably less important when it comes to research writing, where code lineages are generally less significant. The question in writing is, I guess, whether it is sufficient to use "system" to denote what above is classified as a "setup", and also whether we need to be careful to distinguish a "system/setup" from what will lated be called a "runset" (that is, the output of a setup against a given collection).


The second set of names refers to the components of what an information retrieval system produces when run against a test collection:

The ranking of documents produced by a $setup for a single topic or query (against some document corpus).

A document returned at a given rank, as part of a $run.

The set of $runs made by a $setup against the full set of topics in a test collection.

This set of names is a lot more troublesome. It seems that "run" is often used to mean what I've labelled "$runset" above, that is, a system (setup)'s output against the full topic set, rather than a given topic. If "run" were to be so promoted, I guess we could use "ranking" for "$run"; but this seems to risk ambiguity. However, I'm far from happy with "runset"; any name of the form "foo-set" is problematic, because there are lots of ways of forming a set of foo's. The term "submission" is a reasonable alternative when talking about $runsets officially submitted to TREC, but a less happy one for $runsets produced in separate experiments. What do people think? Meanwhile, instead of the admittedly awkward "docrank", one often sees "document", which however is imprecise, or "result", which is extremely ambiguous.


And finally, we have some terms covering the larger entities of TREC-style collections:

The set of documents in a test collection.

The combination of documents, queries, and assessments of which documents are relevant to which queries (exhaustive, pooled, sampled, or dynamically assessed).

The set of all $runsets officially submitted to a TREC experiment; or of all runsets matching some given criteria, such as "the manual $submissionset"; "the title-only $submissionset".

A group participating in a TREC competition; or, the set of $runsets submitted by that group.

The combination of a test $collection and the $submissionset made against it.

The usages of "corpus" and "collection" are fairly standard for TREC, although some people say "collection" for "$corpus"; this is a usage I definitely prefer to avoid, because it leaves nothing suitable for "$collection". "Submissionset" suffers from the general "foo-set" problem; but otherwise you run into lengthy and inelegant circumlocutions, like "the set of all runsets (runs, submissions) officially submitted to TREC". "Participant" is a convenience of no great importance. "Testset" is a "-set", but since there's no separate meaning for "test", it seems acceptable.

One Response to “Runs, rankings, systems, submissions: a nomenclature”

  1. To me, a run is a set of ranked document lists for a given topic set.
    And I think this is the common interpretation:

    "the runs identified by their unique tags"

    "Total number of relevant retrieved documents for this run"

    For a single topic,
    I usually call it "ranked (document) list" or "ranked output."


Leave a Reply