My kids often come home from school spouting crazy “facts” they’ve learned from classmates. It seems fundamentally human to repeat stories and, in the repeating, alter them—often unintentionally. Researchers do the same thing, and just this morning I was irritated to read an entirely inaccurate citation of one of my own papers. No doubt others have had similar feelings while reading my work.
The Leprechauns of Software Engineering, Laurent Bossavit’s in-progress ebook (or e-screed, perhaps), contains wonderfully detailed examples about how some well-known facts in the software engineering field such as “bugs get more expensive to fix as time passes” have pretty dubious origins. The pattern is that we start out with an original source that makes certain claims, hopefully based on empirical evidence. Subsequent papers, however, tend to drop details or qualifications, using citations to support claims that, over time, diverge more and more from those in the original paper. In science, these details and qualifications matter: just because a fact is true under certain circumstances does not mean that it generalizes. Worse, the fact may not even be true in its original form due to statistical issues, flaws in experiment design, and similar. Complicating matters more, Bossavit seems to be finding cases where the slant introduced during citation is self-serving.
One story in Leprechauns made me laugh out loud: Bossavit was lecturing his class on a particular piece of well-known software engineering lore and realized halfway through that he wasn’t sure if or why what he was saying was true, or if he was making sense at all. Something similar has happened to me many times.
Although Leprechauns takes all of its examples from the software engineering field, I have no doubt that something similar could be written about any research area where empirical results are important. Bossavit’s overall message is that the standards for science need to be set higher. In particular, authors must read and understand a paper before citing it. Of course this should be done, but it’s not a total solution to the telephone game. As I think I’ve pointed out here before, the actual contribution of a research paper is often different from the claimed contribution. Or, to put it another way, we first need to understand what the authors intended to say (often this is not easy) and then we also need to understand what was left unsaid. A subtle reading of a paper may require a lot of background work, including reading books and papers that were not cited in the original.