Not long ago I was surprised to find that there was a (slightly) negative correlation between my review scores and the average of the other reviewers’ scores for a collection of papers submitted to a conference. A few days ago I attended the program committee meeting for SenSys 2011, where I again reviewed around 20 papers. This time, the correlation between my score and the average of the other reviewers’ scores was very high: 0.89.
What is the ideal level of correlation between reviewers? This is a little tricky. From the program chair’s point of view, perhaps perfectly correlated scores would be best: since everyone agrees, selecting papers to accept should be trivial. On the other hand, from a selfish point of view I don’t want the correlation to be too high, because that is boring — it means I’m not adding anything to the discussion. However, if correlation is too low, every discussion becomes a fight and that’s no fun either. I’m guessing that a higher level of correlation than what I saw at CAV, and a slightly lower one than I saw at SenSys, would be ideal, maybe around 0.7.