Having an article in submission sharpens one's appreciation for situations to which the article applies; and the ACM's leisurely eighteen-month-plus turnaround on journal articles gives plenty of time for that appreciation to develop. Since early 2009, it has seemed to me that every third retrieval paper either has, or should have, compared non-conjoint rankings for similarity. The most common instance of such rankings are the document lists produced by different retrieval methods, but numerous other examples appear: the frequency-ordered vocabulary of different corpora; the most-common queries from different logs or periods; the weighted connections of different individuals in a social network; and so forth. Indeed, large domain rankings having the features of top-weightedness, non-conjointness, and arbitrary truncation -- what I term *indefinite rankings* -- seem more common than the unweighted, conjoint, complete rankings assumed by traditional rank similarity measures such as Kendall's . And yet there are few if any suitable measures described in the literature for comparing such rankings.

In our article, A Similarity Measure for Indefinite Rankings published (finally!) in the November 2010 edition of ACM TOIS (author's version here), we propose a new similarity metric for just such rankings: *rank-biased overlap* (RBO). RBO is based upon cumulative set overlap, a framework we argue to be more suited to non-conjoint rankings than the notion of correlation employed in most other rank similarity measures. The set overlap at each rank is weighted by a geometric sequence, providing both top-weightedness and convergence. Convergence means that a prefix comparison sets bounds on an indefinite one -- an important characteristic in an environment in which the similarity across ranking pairs of highly varying lengths (such as search results) is to be compared and summarized. The resulting metric implements a simple probabilistic model (though it takes some rather involved algebra to do so). The model is of an observer who scans down the two lists with constant marginal but decreasing absolute patience. Then, when the observer's patience runs out, they randomly select an element from one of the lists, and see if it is in the other. RBO gives the probability that the described behaviour produces an element that occurs in both rankings.

As part of our experimental validation, we ran more than 100 queries daily for several months against eight different web search engines (plus three localizations thereof), then used RBO to analyze the similarity of the results. It turned out that we ran this experiment at the last period during which it was possible on this scale; most of the engines we used have since disappeared or become derivatives of more successful rivals, with a couple of the engines falling off their perch during our study itself.

The ACM's gentlemanly disdain for unseemly haste in casting new ideas before an unready public also allows plenty of opportunity, in the interval between final revision and publication, for other work to be published on the same topic. Just such a publication, appearing at WWW 2010, too late for us to cite, is Visualizing Differences in Web Search Algorithms using the Expected Weighted Hoeffding Distance, by Mingxuan Sun, Guy Lebanon, and Kevyn Collins-Thompson. The EWHD metric is top-weighted and handles non-conjointness; it is unclear to me, however, whether the metric is monotonic in depth, and hence whether it is appropriate for comparisons between ranking pairs of different depths. Sun, Lebanon, and Collins-Thompson do, though, propose an interesting criterion for choosing between rank metrics (always a thorny topic), namely the metric's suitability for use in clustering visualization. (Ravi Kumar and Sergei Vassilvitskii also proposed a new rank similarity measure at the same conference, in Generalized Distance between Ranking, but their measure does not handle non-conjointness.)

I have put together an implementation of RBO (in C). The implementation is somewhat unpolished, although efficient. I hope to add other rank similarity metrics in future; included so far are only Kendall's tau (in an implementation, compared to R's ) and a metric we called Average Overlap, developed in parallel by two different research groups (under different names). Comments and suggestions are, as ever, welcome.

Great post. I like how you posted the software.

Of course, you can post your paper online before it is even submitted to the journal. And such copies are automatically indexed by Google Scholar. I have had papers cited multiple times before they were even accepted.

I rather enjoy the long delay between the initial version and the final paper. If I had my way, not paper would ever be in its final form: they would all be works in progress.

I agree (now) with the idea of posting a paper online prior to submission, but I hadn't realised the length of delay involved previously, and also I think my co-authors are more conservative about such matters. But I think that the online posting does need to be prior to submission, so that you can incorporate suggestions you get from readers of the online version. Otherwise you could end up in the awkward position where your paper has been accepted for publication by the reviewers, but in the meantime readers of the online version have pointed out revisions or improvements to your work. What do you think? What is your practice?

Congratulations!

From my scanty experience, ACM does take time, but also allow to greatly improve the quality. If you care about priorities and early acceptance of your ideas (I fully agree with Daniel here), a preliminary verision could be published. Plus, you will get early comments and corectons.

[...] A weighted similarity measure for non-conjoint rankings updates the recently published A Similarity Measure for Indefinite Rankings and provides an implementation. [...]

Great job and great article. I'm a biologist and although I didn't fully understand the RBO calculation section I can say it has a great potential to be applicable in biology.

Could you please present a sample calculation?

Say, I have two indefinie ranked lists (non-conjoint, top-weighted and incomplete):

1. A 1. D

2. B 2. B

3. C 3. F

4. D 4. A

5. E

6. H

Best wishes!

Thanks for the comments!

I have to admit, I'd not willingly attempt to calculate extrapolated RBO on uneven rankings by hand; I'd just use the implementation at http://www.umiacs.umd.edu/~wew/downloads/rbo-0.2.1.tar.gz. Let me know if you have trouble installing or using this; I'm happy to reimplement in either Python or R if you like.

If one did want to calculate RBO_EXT(<A, B, C, D, E, H>, <D, B, F, A>) by hand, you'd need to calculate the overlaps at ranks 1 through 6 (the longer of the two lists), then plug the values into Equation 32 from the paper. Here are the overlaps, and the various intermediate values we need for Equation 32:

d R1 R2 X_d X_d/d X_s(d-s)/sd

---------------------------------

1 A D 0 0 -

2 B B 1 1/2 -

3 C F 1 1/3 -

s 4 D A 3 3/4 -

5 E - 3 3/5 3/20

l 6 H - 3 3/6 6/24

Now to place them in the formula. I'll do this using a mix of Latex and R syntax; I hope that's clear enough. (I'm assuming p=0.98)

(1) \sum_{d=1}^l (X_d / d) * p^d

= sum(c(0, 1/2, 1/3, 3/4, 3/5, 3/6) * 0.98 ^ (1:6))

= 2.47

(2) \sum_{d=s+1}^l [(X_s (d - s)) / (sd)] * p^d

= sum(c(3/20, 6/24) * 0.98 ^ (5:6))

= 0.36

(3) [(X_l - X_s) / l + X_s / s] * p^l

= ((3 - 3) / 6 + (3/4)) * 0.98 ^ 6

= 0.66

RBO_EXT = ((1 - 0.98) / 0.98) * ((1) + (2)) + (3)

= ((1 - 0.98) / 0.98) * (2.47 + 0.36) + 0.66

= 0.72

which (thank God!) is the same value as the software implementation gives (though the hand working arrived there after correcting several slips).

William

Thank you very much for your help, William. This sample RBO_EXT calculation was really helpful! Thanks for the implementation, everything works great.

I'm doing pairwise comparisons between ranked lists of genes returned by different methods. Since all these methods are implemented in Python, I will try to rewrite the RBO_EXT function.

Again, many thanks!

A question came to my mind during calculations. Does RBO include cases when more than one item have the same rank?

RBO itself, as defined in the paper, does (see Section 4.6). The implementation of RBO doesn't, mainly because I couldn't decide on a simple format that would express ties.

I implemented RBO measure in Python based on Your Equation 32:

http://www.staff.amu.edu.pl/~andrzejz/files/rbo_calc.py

Shall I replace the X_d/d part from Equation 32 with Equation 29 to handle ties?

Thank you.

Yes, this is not entirely clear from the paper.

In Equation 32, the value of $l$ (the length of the longer list) must be unambiguous as a normalizing divisor. Calculate $X_s / s = A_s$ and $X_d / d = A_d$ as in Equation 28. There's then a single $X_s$ that doesn't have $s$ as a divisor. This is defined as implied in Equation 28, viz. if the longer list is tied over rank $s$ (the shorter one can't be), then include all the ties in calculating $X_s$.

Hope this is clear. I've now added implementing ties in my RBO implementation to my todo list.

Thanks. Is the modified equation supposed to look like this one?

RBO_{EXT} = \frac{p}{1-p}\left(\sum_{d=1}^{l}\frac{2X_d}{S_{:d}+L_{:d}}p^d+\sum_{d=s+1}^{l}\frac{2X_s(d-s)}{(S_{:d}+L_{:d})d}p^d\right)+\left(\frac{X_l-X_s}{s}+\frac{2X_s}{S_{:d}+L_{:d}}\right)p^l

Not quite: I make it:

(Latex:

RBO_{EXT}

= \frac{1 - p}{p}

\left(\sum_{d=1}^{l}\frac{2X_d}{|S_{:d}| + |L_{:d}|}p^d

+\sum_{d=s+1}^{l}\frac{2X_s(d-s)}{(|S_{:s}| + |L_{:s}|)d}p^d\right)

+\left(\frac{X_l - X_s}{l}

+\frac{2X_s}{|S_{:s}| + |L_{:s}|}\right) p^l

)

where all the $X_k$ include any ties that go over rank $k$.

Hi,

Thanks for the C code. I was able to get it working. However I often use R and was wondering did you get chance to port the code in R.

Ritesh

Ritesh,

Hi! I did do an implementation in R at one stage, but I've not been maintaining. The computation is inherently iterative (at least as far as I've been able to figure it out), and doesn't suit R's preferred-for-efficiency apply() paradigm, so calculations on long lists can be very slow in R. If you have a C compiler available for your system, I'd suggest compiling the C program and then calling out to it from R using the pipe() function. (And yes, having written this, I've realized that I should write an R package for RBO. I've added it to my list of TODOs.)

William