How (and why) not to rank academics

The recently-launched Microsoft Academic Search, a product of Microsoft Research Asia, has made a bit of a splash as a potential competitor to Google Scholar. Although its coverage does not seem as detailed as Google Scholar quite yet, MS Academic Search has a number of additional features, such as author and conference pages, publication activity graphs, and the like. (It also has a really unwieldly, eight-syllable name; let me abbreviate to MSAS.)

Amongst its features, MSAS provides rankings of “Top Authors” within various fields of computer science. This is, needless to say, not something to be done lightly: academia is built around status and reputation, and to rank academics is to make a statement about their status, which a disclaimer buried in the explanatory notes does little to diminish. And in this case, MSAS should have thought bit more carefully before they decided to publish these rankings.

In MSAS, authors are ranked by the number of in-domain citations they receive within a given time period. This is a dubious enough metric in itself. But even bearing the questionable methodology in mind, MSAS comes up with some slightly surprising top-ten entries in their list of top information retrieval authors of all time , and some even more surprising ones in their ranking of top authors in the last five years.

There are, I suspect, a number of problems with how MSAS is calculating citation counts, but a quick browse through the cited-paper lists of some highly-ranked authors shows that their most egregious aberration within the field of information retrieval is that they are including TREC overview papers in their citation counts.

Now, TREC overview papers attract a lot of citations, because whenever anyone uses the test collection developed at a TREC task, they cite the corresponding overview paper. But TREC is not a peer-reviewed conference. And the overview papers in particular are not research publications at all; they are, rather, a summary of organizational information, participant descriptions, and result statistics. Track organizers put a lot of effort into the tracks, and deserve recognition for this effort in other ways; but counting citations to their overview papers is like ascribing conference citations to the person who wrote the preface to the proceedings.

To be fair, Google Scholar also includes TREC papers in their list of academic publications. But then Google Scholar does not attempt to rank academics. This may simply be laziness on Google’s part, but it might also be a recognition that this is tricky ground to be stepping on. Providing a slight specialization of a search interface is one thing; taking it upon yourself to summarize and rank the publication career of researchers is another. If you’re going to do the latter, you had better pay some attention to doing it correctly.

MSAS has some interesting and useful features. But, as it stands, the author ranking is not one of them. It should be taken down until they get their data and methodology right.

5 Responses to “How (and why) not to rank academics”

  1. Irrelevant says:

    I imagine academics are quite capable of assessing the merits and limitations of this (or any) rank. Microsoft’s effort has merit per se.

  2. william says:

    It took me a good half hour of poking around to figure out why MSAS was giving the rankings it was. How many people are going to make that effort? And I have the domain experience to understand why counting TREC citations is invalid. How many people have this expertise? By publishing these rankings, MS is implicitly giving their credit to them. When academics start writing “ranked 4th in Information Retrieval by MSAS” in their promotion applications, how many promotion committee members are going to take the time or have the expertise to investigate and (correctly) discount those claims?

    Microsoft Academic Search overall has merit, but the rankings don’t.

  3. required says:

    layman’s question – if a TREC paper has no value, why would people cite it? if very few people cite it, then why do we care?

    Hope to see some more explanations.

    TIA

  4. william says:

    TREC participant reports are very rarely cited, with a small number of exceptions, such as the BM25 report from (I think) TREC 3. Overview papers are widely cited so that there is a reference for a data set that is being used. This is somewhat similar to the practice of citing a paper that describes a tool you’re using in your research; for instance, the Terrier team gets a lot of cites for a SIGIR workshop paper describing the Terrier retrieval system by people who use that retrieval system. Such cites perform three chief purposes. First, they point the reader to a location where they can get more information on a dataset or tool. Second, they serve as an acknowledgment and sort of “citation-tip” to the people who developed the dataset or tool. And third, it’s simply good form to have references in academic papers. What such cites do not do, though, in the great majority of cases, is refer to the research contribution of the paper being cited.

    There is merit in counting such citations; they reward groups that make resource contributions to the research community. But resource contributions are not the same as research contributions. It is therefore misleading to describe researchers as “top authors” in a research field because of citations to description documents for resources they’ve released. Where, then, to draw the line? Well, any line is going to be somewhat arbitrary; but the line traditionally draw in the research community is at peer-reviewed venues. And TREC is not.

  5. required says:

    Thanks for the explanation, it seems some people are “tool makers” – they produce TREC and other reports; some people are “trail blazers” – they push the state of the art with geniune research ideas, published in peer-reviewed venues; and yet there are others, just playing the game of citation.

    hopefully there are more metrics to reveal the true contribution of scholars.

Leave a Reply