<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Who has sighted (not just cited) Belkin 1980, &#8220;ASK&#8221;?</title>
	<atom:link href="http://blog.codalism.com/?feed=rss2&#038;p=984" rel="self" type="application/rss+xml" />
	<link>http://blog.codalism.com/?p=984</link>
	<description>William Webber's Research Blog</description>
	<lastBuildDate>Fri, 27 Aug 2010 02:57:11 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: william</title>
		<link>http://blog.codalism.com/?p=984&#038;cpage=1#comment-2699</link>
		<dc:creator>william</dc:creator>
		<pubDate>Tue, 22 Sep 2009 22:41:05 +0000</pubDate>
		<guid isPermaLink="false">http://blog.codalism.com/?p=984#comment-2699</guid>
		<description>Jeremy,

Hmm, good point.  I&#039;ll take a step back on this one.  The authors could, and probably should, have conducted more rigorous evaluation than the rather anecdotal variant they did (search for &quot;university&quot; on page titles only).  SIGIR is not entirely at fault here.  Still, in terms of publications, it is an interesting miss.

William</description>
		<content:encoded><![CDATA[<p>Jeremy,</p>
<p>Hmm, good point.  I&#8217;ll take a step back on this one.  The authors could, and probably should, have conducted more rigorous evaluation than the rather anecdotal variant they did (search for &#8220;university&#8221; on page titles only).  SIGIR is not entirely at fault here.  Still, in terms of publications, it is an interesting miss.</p>
<p>William</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://blog.codalism.com/?p=984&#038;cpage=1#comment-2693</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Tue, 22 Sep 2009 21:04:02 +0000</pubDate>
		<guid isPermaLink="false">http://blog.codalism.com/?p=984#comment-2693</guid>
		<description>William:

&lt;i&gt;The original paper did not fit into the SIGIR mold in that it lacked experimental validation. Now of course part of the problem is that the existing test collections of the time did not provide the data needed to adequately validate (or falsify) the method.&lt;/i&gt;

This is true.  However, look at what the authors write in the original Google paper:

&lt;i&gt;However, most of the research on information retrieval systems is on small well controlled homogeneous collections such as collections of scientific papers or news stories on a related topic. Indeed, the primary benchmark for information retrieval, the Text Retrieval Conference [TREC 96], uses a fairly small, well controlled collection for their benchmarks. The &quot;Very Large Corpus&quot; benchmark is only 20GB compared to the 147GB from our crawl of 24 million web pages. Things that work well on TREC often do not produce good results on the web.&lt;/i&gt;

In other words, they already had amassed their own large (147 GB), real life (web-crawled) collection.  All they need to do to perform experiments is two things: (1) Come up with 50 queries, (2) Evaluate 2 x 10 x 50 = 1000 pages for relevance.

At the time, 50 queries was enough to establish statistical significance at a believable level in the SIGIR community.  It wouldn&#039;t have been that hard to come up with 50 queries.  Grab 5 friends and everyone could do 10 each.  You could do that in half an hour, if not less.

Second, why do I say 2 x 10 page relevance evaluations?  Because you have to have a baseline.  Run your search algorithm without pagerank for each query, and then run it with pagerank.  Look at the top 10 results from each (2 x 10).  Do that for all 50 queries.  (You might even be able to get away with less than 1000 total judgments if there is significant overlap between the sets of documents returned by both systems.)

In the worst case, and with 5 friends, that&#039;s only 200 pages each that you have to look at.  That&#039;s really not a lot of work, overall.  It could have been done.  Heck, I wrote my own first search engine around that time (in early 1998, for the IR course at UMass, where I was a grad student) and Bruce Croft and Jamie Callan (who was at UMass at the time) had us do exactly what I&#039;m proposing.. evaluate relevance ourselves for the top 10 docs returned from each query.  I think I remember doing about 25 queries x 10 = 250 judgments.  Not hard.  Took me an afternoon.

Then, with judgments for PageRank and a non-PageRank baseline, they can compute things like rank of the first relevant hit, Precision @3, @5, @10, etc.  It&#039;s true that they couldn&#039;t have computed recall.  But so what?  Web engines today still don&#039;t compute recall :-)  Precision at the top is what matters to most web searchers.

So they easily could have done all that.  Would the experiment have been on a &quot;standardized&quot; test collection?  No.  Does that matter to the SIGIR community?  My experience is no.  I see novel ideas all the time for which there are no test collections, and still folks manage to come up with reasonable evaluations.  And SIGIR accepts those papers.

Just my $0.02.</description>
		<content:encoded><![CDATA[<p>William:</p>
<p><i>The original paper did not fit into the SIGIR mold in that it lacked experimental validation. Now of course part of the problem is that the existing test collections of the time did not provide the data needed to adequately validate (or falsify) the method.</i></p>
<p>This is true.  However, look at what the authors write in the original Google paper:</p>
<p><i>However, most of the research on information retrieval systems is on small well controlled homogeneous collections such as collections of scientific papers or news stories on a related topic. Indeed, the primary benchmark for information retrieval, the Text Retrieval Conference [TREC 96], uses a fairly small, well controlled collection for their benchmarks. The &#8220;Very Large Corpus&#8221; benchmark is only 20GB compared to the 147GB from our crawl of 24 million web pages. Things that work well on TREC often do not produce good results on the web.</i></p>
<p>In other words, they already had amassed their own large (147 GB), real life (web-crawled) collection.  All they need to do to perform experiments is two things: (1) Come up with 50 queries, (2) Evaluate 2 x 10 x 50 = 1000 pages for relevance.</p>
<p>At the time, 50 queries was enough to establish statistical significance at a believable level in the SIGIR community.  It wouldn&#8217;t have been that hard to come up with 50 queries.  Grab 5 friends and everyone could do 10 each.  You could do that in half an hour, if not less.</p>
<p>Second, why do I say 2 x 10 page relevance evaluations?  Because you have to have a baseline.  Run your search algorithm without pagerank for each query, and then run it with pagerank.  Look at the top 10 results from each (2 x 10).  Do that for all 50 queries.  (You might even be able to get away with less than 1000 total judgments if there is significant overlap between the sets of documents returned by both systems.)</p>
<p>In the worst case, and with 5 friends, that&#8217;s only 200 pages each that you have to look at.  That&#8217;s really not a lot of work, overall.  It could have been done.  Heck, I wrote my own first search engine around that time (in early 1998, for the IR course at UMass, where I was a grad student) and Bruce Croft and Jamie Callan (who was at UMass at the time) had us do exactly what I&#8217;m proposing.. evaluate relevance ourselves for the top 10 docs returned from each query.  I think I remember doing about 25 queries x 10 = 250 judgments.  Not hard.  Took me an afternoon.</p>
<p>Then, with judgments for PageRank and a non-PageRank baseline, they can compute things like rank of the first relevant hit, Precision @3, @5, @10, etc.  It&#8217;s true that they couldn&#8217;t have computed recall.  But so what?  Web engines today still don&#8217;t compute recall <img src='http://blog.codalism.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />   Precision at the top is what matters to most web searchers.</p>
<p>So they easily could have done all that.  Would the experiment have been on a &#8220;standardized&#8221; test collection?  No.  Does that matter to the SIGIR community?  My experience is no.  I see novel ideas all the time for which there are no test collections, and still folks manage to come up with reasonable evaluations.  And SIGIR accepts those papers.</p>
<p>Just my $0.02.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: william</title>
		<link>http://blog.codalism.com/?p=984&#038;cpage=1#comment-2518</link>
		<dc:creator>william</dc:creator>
		<pubDate>Mon, 21 Sep 2009 11:38:37 +0000</pubDate>
		<guid isPermaLink="false">http://blog.codalism.com/?p=984#comment-2518</guid>
		<description>Daniel,

It is true that there is some scepticism about the real value of PageRank in the IR academic community.  Thanks for the reference to your blog post; I had not been aware of some of the papers cited in it and the comments; I&#039;m not convinced that they demonstrate that PageRank was inferior to other solutions available at the time of its publication, but that&#039;s a topic for another time.

However, PageRank was certainly an interesting and, at the time, novel idea, one that has inspired a lot of other work in the area (such as on HITS).  It has prima facie attraction to it.  The original paper did not fit into the SIGIR mold in that it lacked experimental validation.  Now of course part of the problem is that the existing test collections of the time did not provide the data needed to adequately validate (or falsify) the method.  I&#039;ve discussed this with SIGIR luminaries, and their response has been &quot;well, you can always perform &lt;i&gt;some&lt;/i&gt; sort of experimental analysis&quot;.  But I think this is quite wrong-headed: if a proper experimental validation is not possible, an improper one should not be cobbled together to meet some publication hurdle.  And the nature of innovative research is that test collections to validate a new idea do not become publically available until the new idea has already been established.

As it happens, our CIKM paper is on the topic of going through the motions of empirical validation in SIGIR and CIKM papers without the experiments actually meaning anything.</description>
		<content:encoded><![CDATA[<p>Daniel,</p>
<p>It is true that there is some scepticism about the real value of PageRank in the IR academic community.  Thanks for the reference to your blog post; I had not been aware of some of the papers cited in it and the comments; I&#8217;m not convinced that they demonstrate that PageRank was inferior to other solutions available at the time of its publication, but that&#8217;s a topic for another time.</p>
<p>However, PageRank was certainly an interesting and, at the time, novel idea, one that has inspired a lot of other work in the area (such as on HITS).  It has prima facie attraction to it.  The original paper did not fit into the SIGIR mold in that it lacked experimental validation.  Now of course part of the problem is that the existing test collections of the time did not provide the data needed to adequately validate (or falsify) the method.  I&#8217;ve discussed this with SIGIR luminaries, and their response has been &#8220;well, you can always perform <i>some</i> sort of experimental analysis&#8221;.  But I think this is quite wrong-headed: if a proper experimental validation is not possible, an improper one should not be cobbled together to meet some publication hurdle.  And the nature of innovative research is that test collections to validate a new idea do not become publically available until the new idea has already been established.</p>
<p>As it happens, our CIKM paper is on the topic of going through the motions of empirical validation in SIGIR and CIKM papers without the experiments actually meaning anything.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://blog.codalism.com/?p=984&#038;cpage=1#comment-2212</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Fri, 18 Sep 2009 19:49:45 +0000</pubDate>
		<guid isPermaLink="false">http://blog.codalism.com/?p=984#comment-2212</guid>
		<description>Are you absolutely certain that PageRank was good science?

Please see one of my older blog post on this topic:

Is PageRank just good marketing?
http://www.daniel-lemire.com/blog/archives/2007/11/28/is-pagerank-just-good-marketing/


Disclaimer: I teach PageRank to all students in my Information Retrieval course. I&#039;m no PageRank basher. But was it really rejected because of how innovative it was?

Disclaimer 2: I submitted a paper once to SIGIR and, of course, it was rejected. I submitted it to Information Retrieval (the journal) and it was accepted with raving reviews. To this day, I think it was decent work, though I would now rewrite the paper differently. It was certainly an original take on the topic.</description>
		<content:encoded><![CDATA[<p>Are you absolutely certain that PageRank was good science?</p>
<p>Please see one of my older blog post on this topic:</p>
<p>Is PageRank just good marketing?<br />
<a href="http://www.daniel-lemire.com/blog/archives/2007/11/28/is-pagerank-just-good-marketing/" rel="nofollow">http://www.daniel-lemire.com/blog/archives/2007/11/28/is-pagerank-just-good-marketing/</a></p>
<p>Disclaimer: I teach PageRank to all students in my Information Retrieval course. I&#8217;m no PageRank basher. But was it really rejected because of how innovative it was?</p>
<p>Disclaimer 2: I submitted a paper once to SIGIR and, of course, it was rejected. I submitted it to Information Retrieval (the journal) and it was accepted with raving reviews. To this day, I think it was decent work, though I would now rewrite the paper differently. It was certainly an original take on the topic.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://blog.codalism.com/?p=984&#038;cpage=1#comment-1833</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Mon, 14 Sep 2009 20:12:54 +0000</pubDate>
		<guid isPermaLink="false">http://blog.codalism.com/?p=984#comment-1833</guid>
		<description>I haven&#039;t &quot;sighted&quot; Belkin 1980, but I have &quot;sighted&quot; Belkin et al 1982:

http://irgupf.com/2009/03/09/exploration-and-explanation/comment-page-1/#comment-4480

And I suspect a lot of other people have sighted Belkin 1982 as well, given that it was republished in the 1997 &quot;Readings in Information Retrieval&quot; book (Sparck Jones and Willett, eds.)  

I&#039;m sure that 1982 contains much the exact same theoretical groundings as 1980, so it&#039;s not like I/we/the community doesn&#039;t understand the ASK hypothesis and are only going on hearsay.  But you raise a good point: Why is it that 1980 gets cited so often, rather than the more widely-read 1982?  I&#039;ll bet it has something to do with researchers wanting to cite the earliest works, lest they get called out for it.</description>
		<content:encoded><![CDATA[<p>I haven&#8217;t &#8220;sighted&#8221; Belkin 1980, but I have &#8220;sighted&#8221; Belkin et al 1982:</p>
<p><a href="http://irgupf.com/2009/03/09/exploration-and-explanation/comment-page-1/#comment-4480" rel="nofollow">http://irgupf.com/2009/03/09/exploration-and-explanation/comment-page-1/#comment-4480</a></p>
<p>And I suspect a lot of other people have sighted Belkin 1982 as well, given that it was republished in the 1997 &#8220;Readings in Information Retrieval&#8221; book (Sparck Jones and Willett, eds.)  </p>
<p>I&#8217;m sure that 1982 contains much the exact same theoretical groundings as 1980, so it&#8217;s not like I/we/the community doesn&#8217;t understand the ASK hypothesis and are only going on hearsay.  But you raise a good point: Why is it that 1980 gets cited so often, rather than the more widely-read 1982?  I&#8217;ll bet it has something to do with researchers wanting to cite the earliest works, lest they get called out for it.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
