My news feed this morning contained this article about an unpleasant local situation that has caused one person I know to lose her job (not because she was involved with the malfeasance, but as fallout from this lab shutting down). On the positive side (I’m going from the article and the investigating panel’s report here — I have no inside information) it sounds like the panel came to the appropriate conclusions.
But my sense is that most research cheating is not nearly so overt. Rather, bogus results come out of a mixture of experimental ineptitude and unconscious squashing of results that don’t conform to expectations. A guide to cheating in a recent issue of Wired contains a nice little summary of how this works:
Create options: Let’s say you want to prove that listening to dubstep boosts IQ (aka the Skrillex effect). The key is to avoid predefining exactly what the study measures — then bury the failed attempts. So use two different IQ tests; if only one shows a pattern, toss the other. Expand the pool: Test 20 dubstep subjects and 20 control subjects. If the findings reach significance, publish. If not, run 10 more subjects in each group and give the stats another whirl. Those extra data points might randomly support the hypothesis. Get inessential: Measure an extraneous variable like gender. If there’s no pattern in the group at large, look for one in just men or women. Run three groups: Have some people listen for zero hours, some for one, some for 10. Now test for differences between groups A and B, B and C, and A and C. If all comparisons show significance, great. If only one does, then forget about the existence of the p-value poopers.
I do not recommend the full article, I only read it because I was trapped on a long flight.
The pattern underlying all of these ways of cheating is to run many experiments, but report the results of only a few. We only have to run 14 experiments that measure no effect before we get a better than 50% chance of finding a result that is significant at the 95% level purely by chance. My wife, a psychologist, says that she has observed students saying things like “we’ll run a few more subjects and see if the effect becomes significant” without realizing how bad this is. In computer science, we are lucky to see empirical papers that make use of statistical significance tests at all, much less make proper use of them.