In math, a series of logical steps leads to a logically correct result. In life, not so much. To find examples of reasonable steps leading to crazy consequences, there’s no need to look further than the nearest bureaucracy. Let’s take an example from an Institutional Review Board (IRB) — a part of the University bureaucracy whose job it is to ensure that researchers like me conduct experiments without harming participants (or bystanders).
This researcher I know about was running some studies where the participants were given alcohol. To permit this experiment to proceed, the IRB stipulated that female participants had to first be given a pregnancy test, to avoid the possibility that a woman unaware of her pregnancy would expose the fetus to alcohol during the experiment. So far so good, but how would you like being on either side of a conversation that starts out like this?
I know you just signed up for this experiment for lab credits, and I hate to be the one to tell you this, but…
Another bit of low-grade IRB humor can be found at the University of Utah’s IRB web site, which specifically mentions which report to file if “incarceration of a participant” occurs. What kind of studies are people running? And how can I get involved?
My student Peng and I have been submitting lots of bug reports to maintainers of open source software packages. These bugs were found using Peng’s integer undefined behavior detector. We’ve found problems in OpenSSL, BIND, Perl, Python, PHP, GMP, GCC, and many others.
As we reported these bugs, I noticed developers doing something funny: in many cases, their first reaction was something like:
Please stop bothering us with these stupid static analysis results!
They said this despite the fact that in the initial bug report, I usually pointed out that not only were these results from actual tests, but they were from the tests provided by the developers themselves! This interaction with PHP’s main developer is a good example.
What can we learn from this? I take away a few different messages:
- From the developer’s point of view, static analysis results can suck. As these tools become smarter, they consider more and longer code paths. In a very real sense, this makes their results more difficult to reason about. It should go without saying that bug reports that are hard to reason about are not very actionable.
- The developers of popular open-source software projects are suffering from static analysis fatigue: a syndrome brought on by seeing too many bug reports from too many random tools, too few of which are actionable. If I made a living or a career out of static analysis, I’d be seriously worried about this.
- Many developers are still in the dark about C’s integer undefined behaviors. This is a constant surprise to me given that GCC and other common compilers have evaluated “(x+1)>x” to “true” for quite a while now.
- The best bug reports, by far, are those that are accompanied by a failure-inducing input. I think the reason that we’ve been running into developer confusion is that we’re telling people that their own “make check” is a failure-inducing input, which people don’t want to hear since when they run it, everything looks fine. Some of these problems will go away when our undefined behavior checker makes it into an LLVM release. Lacking the checker people can still reproduce our results, but only by manually adding assertions.
- Developers are prideful about their projects. This is as it should be, unless it blinds them to useful bug reports.
Regarding static analysis fatigue, there can be no silver bullet. People developing these tools must take false-positive elimination very seriously (and certainly some of them have). My preference for the time being is to simply not work on static analysis for bug-finding, but rather to work on the testing side. It is more satisfying and productive in many ways.