Via Jeff Dalton and the LingPipe blog, comes Reading Tea Leaves: How Humans Interpret Topic Models, an enjoyable paper on evaluating the meaningfulness to humans of the topics that topic models like Latent Dirichlet Allocation produce. The authors come up with a couple of interesting evaluation tasks, which they execute on Mechanical Turk. The one I particularly liked was to introduce an alien word into the word list of a topic, and then see whether the user can locate the intruded word — something like the old “which of these things is not like the other” song/game on Sesame Street. Their finding is that human evaluation and the automated or statistical evaluation techniques generally used do not agree on which topic model is the most effective; but it is the inventiveness of the method they developed to test the human interpretation of topic coherence that I particularly liked.