Not long ago I was surprised to find that there was a (slightly) negative correlation between my review scores and the average of the other reviewers’ scores for a collection of papers submitted to a conference. A few days ago I attended the program committee meeting for SenSys 2011, where I again reviewed around 20 papers. This time, the correlation between my score and the average of the other reviewers’ scores was very high: 0.89.

What is the ideal level of correlation between reviewers? This is a little tricky. From the program chair’s point of view, perhaps perfectly correlated scores would be best: since everyone agrees, selecting papers to accept should be trivial. On the other hand, from a selfish point of view I don’t want the correlation to be too high, because that is boring — it means I’m not adding anything to the discussion. However, if correlation is too low, every discussion becomes a fight and that’s no fun either. I’m guessing that a higher level of correlation than what I saw at CAV, and a slightly lower one than I saw at SenSys, would be ideal, maybe around 0.7.

Re: “not adding anything.” If three reviewers agree, did none of them add anything?

Eric, once the required level of confidence in the overall review score is achieved, additional reviews are indeed useless.

Well, of course, a (good) review is much more than a score, correlated or not.