Review utility and reviewer gender

Another interesting paper at CIKM was "Inferring Gender of Movie Reviewers: Exploiting Writing Style, Content, and Metadata" (no link because the ACM digital library is currently broken), by Jahna Otterbacher. Working with a crawl of top-rated IMDB reviews for popular movies, the paper builds a logistic regression model for predicting reviewer gender. Features used include writing style; content, including a 20 dimension LSA concept set; and metadata such as age of review and popularity of movie. The paper's experiments find support for previous research observations on gendered writing, such as that male reviewers seek to convey information about the movie itself, whereas female reviewers describe their experience of and reaction to the movie.

Style and content features, though, are only weakly predictive of reviewer gender. Accuracy using these features on this data set is around 65%, whereas previous studies on blogs and more formal writing achieve an accuracy of 80% or more. The reason is in part the brevity of many movie reviews; higher accuracy is achieved on longer reviews. The one feature that is highly predictive of gender is, rather strikingly, the usefulness of the review as rated by IMDB users. Put simply, rated utility correlates strongly with male authorship; three-quarters of the top-20 rated reviews for each movie are by men. Utility alone achieves a predictive accuracy of 72%, which the addition of all other features is able to improve by only 1%.

The author hypothesizes that male reviewers achieve higher ratings because (she posits) the IMDB user community is predominantly male, and raters tends to find reviews by members of their own gender more useful. To examine this hypothesis, the author builds a second logistic regression model, this time to predict utility amongst female reviewers, and finds that some of the features predicting useful female reviews are the same as those predicting male reviewers.

Unfortunately, as the author acknowledges, the data necessary to directly test the hypothesis about male raters preferring male reviewers is not available. The gender proportion of IMDB users or raters is not known, let alone the gender proportion of raters favouring each genders' reviewers. Also unknown is the proportion of reviewers that are male. The IMDB interface only provides gender information in a mode in which male and female reviews are interleaved, and the same number of each is always presented. This lack of data means that the author is not able to disprove (and does not even discuss) the much more straightforward hypothesis that three quarters of top-20 reviews are by men, because three quarters of all reviews are by men. If the latter premise were true, then the gender-interleaved interface would display all female reviews alongside the top third of male reviews, which would artificially create the observed correlation between review rating and gender. Indeed, the remarkable power of review utility to predict gender, contrasted with the failure of direct measurements of maleness or femaleness of writing style and content, makes it seem probable that some such selection bias is at least partly responsible for the observed results.

Despite the problems the above analysis suggests for the paper's link between gender, review utility, and community bias -- problems which are beyond the author's control, given the data available -- this is an interesting and suggestive paper. It provides an informative example of the use of logistic regression in the analysis of textual data, and of the methods of machine learning in the humanities and social sciences. I hope that the author will be able to obtain more direct access to IMDB data on reviewer and rater gender, so that her hypotheses can be more reliably tested.

Leave a Reply