Recently I reviewed 19 papers that were submitted to CAV 2011. This is the first time I’ve been involved with a pure verification conference, and consequently I greatly enjoyed reading the papers because almost every one contained something new. Each time I submitted a review I looked at the ones that were already submitted for that paper, and kept being surprised at how often I disagreed with the other reviewers. Finally I just computed the correlation between my score and the average of the other reviewers’ scores, and the result was an astonishingly low -0.07.
In contrast, when I review papers at a “systems” venue (doesn’t matter if it’s operating systems, embedded systems, or something else) it’s not at all uncommon for me to give exactly the same score as all, or almost all, of the other reviewers. My guess is that at the last five systems conferences I was involved with, the correlation between my score and the average of the other reviewers’ scores was higher than 0.8.
I’m not sure there’s a take-away message here other than “communities have very different evaluation standards for papers.” However, this does shed a bit of light on why it can be quite difficult to switch areas. Closely related: Doug Comer’s excellent piece on how to insult a computer scientist.