Alistair passed me a copy of the latest issue of Information Retrieval, which is devoted to reports from the Reliable Information Access workshop. The workshop was run in 2003, and the reports are being published in 2009, so we are not discussing breaking news here. Still, the concept of the workshop was very interesting: invite a dozen leading information retrieval research groups to a six-week, on-site experiment employing seven different retrieval systems, to tackle (broadly speaking) the question of why information retrieval technology is not improving. There were two specific subtasks: an intensive failure analysis of why retrieval systems, individually and collectively, performed poorly on certain topics and not on others; and a multi-dimensional exploration of the effectiveness, limits, commonalities, and differences of pseudo-relevance feedback techniques, one of the few promising general-purpose retrieval techniques that go beyond keyword matching.
RIA Dataset
December 17th, 2009Philosopher Kings
December 7th, 2009Switching envelopes for fun and profit
December 1st, 2009
I hold two envelopes, and tell you that one holds twice as much money as the other. I then let you choose and keep one. You are about to take the envelope in my left hand, but then decide to do a quick probability calculation. Let the amount of money in the left envelope be
. Then the amount in the right envelope is either
or
. Now, since you picked the left envelope at random, these two amounts are equally likely for the right envelope. Therefore, your expected gain in switching to choosing the right envelope is
, whereas your gain in stick with the left envelope is
. So, logically, you should switch. But then, if you’d chosen the right envelope first, then the same logic would dictate that you should switch to the left one. And that can’t be right. So what’s wrong in your reasoning?
Read the rest of this entry »
Malaysian latex exports
November 30th, 2009
OK, stick with me on this one. Latex is derived from the rubber plant. Rubber is one of Malaysia’s chief exports. There is a widely-used typesetting tool called LaTeX. Therefore (pay attention to the clever link here), it is only natural that the Malaysian LaTeX User Group blog should be one of the most fruitful and creative out there when it comes to clever LaTeX hints and tricks, particularly relating to creating rich layouts and presentations. Highly recommended.
Which of these terms is not like the topic?
November 20th, 2009
Via Jeff Dalton and the LingPipe blog, comes Reading Tea Leaves: How Humans Interpret Topic Models, an enjoyable paper on evaluating the meaningfulness to humans of the topics that topic models like Latent Dirichlet Allocation produce. The authors come up with a couple of interesting evaluation tasks, which they execute on Mechanical Turk. The one I particularly liked was to introduce an alien word into the word list of a topic, and then see whether the user can locate the intruded word — something like the old “which of these things is not like the other” song/game on Sesame Street. Their finding is that human evaluation and the automated or statistical evaluation techniques generally used do not agree on which topic model is the most effective; but it is the inventiveness of the method they developed to test the human interpretation of topic coherence that I particularly liked.
ACSSGIRIT
November 10th, 2009
According to Tetsuya Sakai, who runs information retrieval research throughout greater Asia via a number of front organizations, AIRS 2010 will be hosted by the National Taiwan University in Taipei, which is great, because it gives me an excuse to visit Taipei again. Tetsuya also announced that AIRS is changing its name from the Asian Information Retrieval Symposium to the Asian Information Retrieval Societies conference. The change is a valid one, since AIRS really is a conference, not a symposium (that is, it is an occasion for carrying things together, not a drinking party). However, the name change has had the interesting side effect of bringing into existence a whole new class of organization, namely an Asian Information Retrieval Society.
Apologies
November 9th, 2009A.) For not blogging frequently during CIKM. I was exhausted at the end of day one, drunk at the end of day two, and lazy at the end of day three. I’ll write up some notes later, but I realize that they’ll lack that tweet-like freshness.
B.) That my blog went down. My swap partition decided to go on holiday on someone else’s VPS. We seem to be back now.
C.) To Professor Hara. While your eyes do look like the moon reflected in a lake at night, I now realize that this was inappropriate to say to a married man over the banquet table.
At CIKM
November 3rd, 2009
I’m at CIKM in Hong Kong this week. Today was tutorial day. I went to a tutorial with a very long title on information extraction, given by Marius Pasca from Google; as someone who doesn’t know much about the domain, I found it quite helpful and informative. The food here is excellent, even better than SIGIR in Singapore. The conference is in an exhibition centre at the airport itself. The centre is cavernous and, at least today, entirely empty apart from the few dozen CIKM tutorial attendees. Tomorrow, the marathon of the main conference starts: four parallel streams, each with three sessions a day, and each session having five papers in 110 minutes. That makes 22 minutes per paper, which is 30 seconds less than the 22.5 minutes at CIKM 2008. I’m not sure whether they’re reducing paper time by this amount every year as a matter of policy. If so, CIKM 2010 is going to be a doozy, because the shortest way to arrange 21.5 minute papers into multiple-of-five-minute sessions is to have 10-session paperspaper sessions running for three hours thirty-five minutes each.
Anyway, if anyone else is at CIKM, let me know; it would be great to catch up.
“Etiolate”, of course, is what I meant
October 10th, 2009
I’m reading Leland Wilkinson’s The Grammar of Graphics at the moment. One of the unexpected delights of the book is the richness of its vocabulary, and the resultant enriching of my own vocabulary through reading it. Now I’m just waiting for the opportunity to observe to someone that their data looks heteroscedastic and their plot surprisingly ogive. Anyway, while reacquainting myself with the definition of “etiology”, I stumbled across the word that I had misremembered as “ateliorate”. It is, in fact, “etiolate”: literally, to make (a plant or person) pale by the exclusion of sunlight; figuratively, to “cause to lose vigour or substance” (Shorter OED). A day too late, alas, for use in my ADCS reviews…
Why the second approach to writing always seems best
October 5th, 2009
Although I’m making progress, I still haven’t settled on a stable writing method. One of my problems has always been that whichever approach I tried first for a given piece of writing, it would always seem unsatisfactory, and switching to another for the re-write would give better results. So, for instance, take the two approaches of careful planning on the one hand, versus diving into writing itself on the other. I might start one piece with careful planning, collecting all my material and arranging it in order before writing anything; after a while, though, I seem at a dead end, and put my plan aside to simply write what I’m thinking of. For the next piece, I start by just writing; but several pages later, the writing is off track and I’m no longer sure what I’m saying. I then turn away from the writing and think about the structure and plan of the argument, and again make better progress. This process of alternating between two (or rotating between multiple) different approaches can go on indefinitely.
It only occurred to me today (when, appropriately enough, preparing for a research methods talk on statistical methods) why this confusing process occurs, in which the method you try second always seems best, no matter what it or the first method are. Whatever you do first, it clarifies and organizes your thinking about what you have to write; so the method you try second naturally comes more easily than it would have if it, instead, had been first. So even if A is actually a better preparatory method than B, nevertheless working in the order AB (say, planning then writing, or conversely, writing then planning), B will often seem more fluent, whatever B is.


