Extreme Bibtex

June 27th, 2009

My thesis write-up is proceeding by task, rather than by topic. First was organising my bibliography, then gluing together my papers verbatim. Most recently, I’ve been compartmentalising my figure generation, so that graphs and images can be independently created from separate data files under the guidance of make, rather than being extracted by arcane invocations over my accreted R dump file. I’ve reached no particular insights on the latter two tasks, except that using Latex makes agglomerating disparate publications under the same style relatively painless, and that standalone figure generation and data capture is a discipline I should have imposed on myself long ago.

For my bibliography, I have finally acted on my long-standing intention to make more extensive use of string definition and concatenation in Bibtex. String definitions in Bibtex are familiar enough; we make them like so:

@string{ acmtois = “ACM Transactions on Information Systems” }

to be used like so:

@article{jk02:acmtois,
author = {J{\”{a}}rvelin, Kalervo and Kek{\”a}l{\”a}inen, Jaana},
title = {Cumulated gain-based evaluation of {IR} techniques},
journal = acmtois,
year = {2002}
}

Such definitions not only save effort in typing and wetware parsing, but also maintain consistency in formulation and abbreviation throughout the file. However, by themselves, they are limited, in that while they work for journals, which have a constant title, they do not work for conference proceedings, whose title changes each year (for instance, from “Proceedings of the 15th International Symposium…” to “Proceedings of the 16th International Symposium…”).

Previously, I had been writing out each conference title verbatim in the entry for that conference. But then if you want to change the way you present the title (for instance, if you want to abbreviate “Proceedings” to “Proc.” and “International to “Int.”), you have to go through every instance of that conference (and there might be a dozen or more) and change them individually.

The long-pondered solution that I’ve adopted for my thesis bibliography is to split conference names into pre-numeral, numeral, and post-numeral sections, then use another of Bibtex’s facilities, the “#” string concatenation operator, to join them together, like so:

@string{ proc = “Proc.”}
@string{ ecir:post = “”European Conference on {IR} Research” }

@proceedings{ecir07,
title = proc # ” 29th ” # ecir:post,

}

@proceedings{ecir08,
title = proc # ” 30th ” # ecir:post,

}

Having gone this far, I decided to do the same thing for author names and author lists, again for the same of conciseness, conformity, and ease of change:

@string{ et = ” and ” }
@string{ moffat = “Moffat, Alistair” }
@string{ webber = “Webber, William” }
@string{ zobel = “Zobel, Justin” }

@inproceedings{mwz07:sigir,
author = moffat # et # webber # et # zobel,
title = {Strategic System Comparisons via Targeted Relevance Judgments},
pages = {375–382},
crossref = {sigir07}
}

@inproceedings{wmz08:cikm,
author = webber # et # moffat # et # zobel,
title = {Statistical Power in Retrieval Experimentation},
pages = {571–580},
crossref = {cikm08}
}

The next step would be to also define strings for “1st”, “2nd” and so forth, to allow for consistent switching to the long forms “First”, “Second” etc.. And of course the scheme does not support syntactic changes, such as starting each conference with its abbreviated name. But despite these limitations, adopting this approach has left me with a Bibtex file that is much less ugly to look at and awkward to maintain.

Thesis-writing the Caltech way

June 22nd, 2009

I decided, if it’s good enough for Caltech, then it’s good enough for me, so I took Danny’s advice and dumped all my papers holus-bolus into my thesis skeleton. From zero to 70,000 words, all in one day! Another 10,000 words of acknowledgments and it’s off to the binder. Now, red with gold lettering, or blue with silver lettering? Or maybe more of a floral effect…

Progress

June 17th, 2009

Two-thousand words done today. Unfortunately, it was for a journal review.

Writing up

June 15th, 2009

My projects are marked, my CIKM submission is in; the decks are cleared — time to start writing up my thesis!

A number of people have told me that expectations for the written-up PhD thesis are different in, say, the US from what they are in Australia. In the US, so I’m told (edit: but possibly incorrectly; see comments), the dissertation can be little more than a collation of the publications made during the candidature. Here in Australia, in contrast, the thesis is expected to be a coherent whole. Your publications will form the basis of the thesis chapters, but the content needs to be re-written, expanded, corrected, and harmonized with the overall themes of the thesis.

In practice, of course, your understanding about your research develops over time. Your mature judgment may diverge from your earlier focus. This makes writing up your thesis a daunting task, and — I apprehend — one that can feel rather retrogressive. At the same time, though, the opportunity to re-appraise your earlier work can be a fruitful one. At least I hope so.

As for me, I have three papers from SIGIR to write up, one from CIKM, an ADCS effort, a couple of posters, and two papers in submission. With the exception of the ADCS publication, which led into a paper at SIGIR, these are on quite diverse subjects. I’m toying with the idea of changing my thesis title to “Et Cetera: Miscellaneous Topics in Information Retrieval Evaluation”.

E-learning and teaching complacency

June 10th, 2009

An interesting article in The Australian’s Higher Education Supplement on the threat posed to traditional universities by e-learning. The meat of the article is that the technology of and investment in e-learning is reaching the stage where it is a very real competitor to traditional, on-site university education. Economies of scale mean that the Stanfords of this world can offer e-learning degrees more cheaply than second-tier universities can offer their own less prestigious on-site degrees. The author argues that traditional universities need to react by making on-site education a more compelling and enjoyable experience. This will require more emphasis, not less, on teaching quality.

Teaching ideas

June 9th, 2009

I’ve been picking up a number of interesting teaching ideas, largely from attending seminars given by the University’s Engineering Learning Unit (ELU). The two seminars I recently attended were both concerned with peer learning. This is a popular topic at the moment — partly perhaps as a reaction to increasing class sizes and decreasing academic time. In the first seminar, Harald Søndergaard discussed his experiences using a couple of online peer-learning tools, Praze and Peerwise. These have also been used by other academics within our Department.

Praze is a tool developed at the University of Melbourne that supports student peer review of other students’ work. The tool manages the workflow of assigning, distributing, and collecting peer review assignments, and supports double blind reviewing and feedback on other reviews. Harald observes that students are particularly keen to get frequent feedback on their progress, and that the peer review system supports this in two ways: they get comments on their work from their peers; and they are able to see what other peers are doing. The system works particularly well if assignments are divided into multiple stages, so that there is a continuous cycle of submission and review. An advantage of staged assignments is that students are explicitly permitted to take ideas from a strong implementation in an earlier stage and apply them as the basis for their own next stage; thus, students do not fall behind if they get off to a bad start.

Peerwise is developed and hosted at the University of Auckland. It supports peer-created multiple-choice questions. Students write multiple-choice questions for other students to answer. The questions themselves are reviewed by other students, and authors are able to refine and improve them in response to these reviews. And teaching staff are able to see which questions and topics are posing the greatest difficulty to students. I thought that this was a really, really neat idea, and it looked to be rather fun, although lecturers who had used it observed that there needed to be incentives in order to get students actively involved, such as an assessment component and the suggestion that some of the best questions might make it onto the final exam. (Other interesting quiz-like ideas were suggested in a later seminar, such as student-created crosswords.)

The Faculty of Economics at the University participates in a program called PASS — Peer Assisted Study Sessions — and the second seminar I attended was given by one of the supervisors of this scheme. These are essentially guided student group study sessions. Group leaders are selected from high-performing students from the subject’s previous year, and are paid for their time. Sessions are timetabled and have teaching space assigned to them, but participation is voluntary and there is no assessment component. The group leader is not meant to act as a tutor; they are not meant to teach anything, or to directly answer questions, but to encourage discussion and group-work amongst the group. The supervisor described the group leaders as being in part a “model student” for the others to emulate. The scheme received very high approval ratings from participants, but of course there is a selection bias here: only students who like the scheme will participate. There was an interesting discussion with the supervisor about gender balance (the groups are predominantly female; when a male student does turn up, he can tend to monopolise the conversation) and cultural issues (the groups tend to segregate along ethnic lines, with international students sometimes falling into discussions in their native language).

And a final interesting teaching (or at least assessment) idea from Guy Gershoni: assessment via interview. Students work on projects in groups, and hand in a project report. The main component of the assessment, however, is an interview between the group and the assessor (lecturer or tutor). At this interview, the assessor goes over the report, asking questions about and discussing it, to gauge the students’ real understanding of the issues involved. Furthermore, Guy asks the group what mark they think they deserve for the project, and finds that the students are generally harder markers of themselves than he is. This seems an interesting and relatively painless way to do assessment, but also potentially an unreliable one if used without care.

Has adhoc retrieval improved since 1994?

June 4th, 2009

Researchers in information retrieval frequently confront the presumption that search is a solved problem (more colloquially phrased as “doesn’t Google do that already?”). Expressed so directly, this is a misconception: there are plenty of goals great and small still to be achieved in IR technology, as the recent interest in alternative search approaches such as Wolfram’s Alpha and Google Squared suggests. A slightly trickier question is, is the science of IR improving? After all, just because we haven’t got to where we want to be, it doesn’t mean we’re necessarily making progress towards it.

There are various ways one might try to measure progress in information retrieval science. One of the most concrete is to inquire whether the measured retrieval effectiveness of IR systems has increased over time. There is an obvious and readily-available dataset to ask this question of — the runs submitted to the annual TREC competition. The problem, however, is that each year’s experiment is made on a different query set, with different participating systems and slightly different rules and processes. This makes it difficult to compare scores from one experiment to another; if scores went up, it may be because the task was easier, not because the systems were better. The only serious attempt to assess whether cross-collection results in TREC demonstrate an improvement in IR technology has been by running each of the first 8 TREC-participating versions of the SMART retrieval system against all of the first 8 TREC collections (see Figure 8 of the overview to TREC 8); however, while interesting, this approach only directly measures improvement in one retrieval system, not in the breadth of IR technology as measured across all TREC participants.

Our SIGIR 2009 poster, “Has Adhoc Retrieval Improved Since 1994?“, investigates progress in IR technology through the prism of the past decade and a half of TREC results on adhoc retrieval tasks. We look at the scores achieved by participants in the AdHoc track of TREC from 1994 to 1999 (TRECs 3 through 8), and the very similar Robust track from 2003 to 2005. The AdHoc track is the “classic” TREC task — document retrieval for once-off queries on newswire data, with only textual features to guide the search system. In our study, we manage variance in test collection difficulty through score standardization. Our method is to take 5 publically available retrieval systems, in a total of 15 difficult configurations, and run them against all 9 test collections. The scores these systems achieve on each test collection act as a reference point to assess the difficulty and variability of that collection, and adjust the scores of the originally participating TREC systems accordingly. This standardization process creates a more level playing field, allowing us to compare the retrieval effectiveness scores of each year’s TREC participants directly.

So, has ad-hoc retrieval technology improved since 1994?

Standardized scores of TREC participants in AdHoc and Robust tracks

Standardized MAP of TREC participants in AdHoc and Robust tracks. Thick lines are medians; box edges are quartiles; end of whiskers are highest and lowest values.

Apparently not. As a group, the systems that participated in the 1994 experiment are more than competitive with any of the subsequent years. In fact, participants seem to have gone backwards for a few years after 1994, before stabilising in the last year of the AdHoc task proper. And the very best system in the entire set (and we are talking about several hundred systems altogether here) would seem to have been the best system from 1994.

The AdHoc track of TREC was discontinued in 1999 in large part because it was felt that effectiveness on the task had plateaued; newer tasks have taken its place. So our finding of no improvement, while stark, is not entirely surprising — although the fact that the plateau appears to have been reached by 1994, just two years after TREC’s inception, is somewhat thought-provoking. In a sense, we are providing objective confirmation for what has been anecdotally believed. The question it does raise, though, is why. It is hardly that ad-hoc retrieval is perfect. Is it that no further improvement is possible with the techniques currently available to us? Or is it that the task itself, as formulated in TREC ad-hoc retrieval experiments, has too much ambiguity and imprecision to allow a better results?

Staff/student ratios at Australian universities

May 29th, 2009

Tim Armstrong and I were discussing the apparent decrease over the past couple of decades in the time that academics have available for students. I’ve come across a pertinent number in an internal discussion paper on the University of Melbourne’s future. The ratio of students to teaching staff at Australian universities was 12.9 in 1990; in 2006, this had increased to 20.5. I imagine that the amount of research output expected of academics has also grown substantially over that time. So yes, Australian academics are a lot busier than they used to be.

Why does the ACM hate the planet?

May 24th, 2009

I don’t like to go on banging my anti-conference drum (well, I do, actually, but authorial conventions dictate that I feign reluctance), but I’ve been reading David Mackay’s new book, Sustainable Energy — without the hot air, and in it he points out that taking one intercontinental flight a year is equivalent in carbon emissions to driving 35km every day for that year. So if the ACM were to convert SIGIR and CIKM from physical to virtual conferences, it would have the carbon effect of taking 600 cars off the road (assume 600 attendees per conference, 1/2 of whom have to make an intercontinental flight).

William: SIGIR-free since 2009

May 22nd, 2009

Well, I’ve managed to get myself out of going to SIGIR. Thanks to Laurence for volunteering to spend 60 hours flying to the other side of the world to check his email. Now my challenge is to get a better understanding of the work presented at the conference than those attending it.