Static Analysis Fatigue

My student Peng and I have been submitting lots of bug reports to maintainers of open source software packages. These bugs were found using Peng’s integer undefined behavior detector. We’ve found problems in OpenSSL, BIND, Perl, Python, PHP, GMP, GCC, and many others.

As we reported these bugs, I noticed developers doing something funny: in many cases, their first reaction was something like:

Please stop bothering us with these stupid static analysis results!

They said this despite the fact that in the initial bug report, I usually pointed out that not only were these results from actual tests, but they were from the tests provided by the developers themselves! This interaction with PHP’s main developer is a good example.

What can we learn from this? I take away a few different messages:

From the developer’s point of view, static analysis results can suck. As these tools become smarter, they consider more and longer code paths. In a very real sense, this makes their results more difficult to reason about. It should go without saying that bug reports that are hard to reason about are not very actionable.
The developers of popular open-source software projects are suffering from static analysis fatigue: a syndrome brought on by seeing too many bug reports from too many random tools, too few of which are actionable. If I made a living or a career out of static analysis, I’d be seriously worried about this.
Many developers are still in the dark about C’s integer undefined behaviors. This is a constant surprise to me given that GCC and other common compilers have evaluated “(x+1)>x” to “true” for quite a while now.
The best bug reports, by far, are those that are accompanied by a failure-inducing input. I think the reason that we’ve been running into developer confusion is that we’re telling people that their own “make check” is a failure-inducing input, which people don’t want to hear since when they run it, everything looks fine. Some of these problems will go away when our undefined behavior checker makes it into an LLVM release. Lacking the checker people can still reproduce our results, but only by manually adding assertions.
Developers are prideful about their projects. This is as it should be, unless it blinds them to useful bug reports.

Regarding static analysis fatigue, there can be no silver bullet. People developing these tools must take false-positive elimination very seriously (and certainly some of them have). My preference for the time being is to simply not work on static analysis for bug-finding, but rather to work on the testing side. It is more satisfying and productive in many ways.

September 1, 2010

regehr

Computer Science, Software Correctness

14 responses to “Static Analysis Fatigue”

Robby says:

September 1, 2010 at 3:20 pm

Have you run that on Racket?
regehr says:

September 1, 2010 at 3:50 pm

Yeah– but it ran into trouble, I think in the jitter. I believe I sent the results to Matthew.

If you folks can build racket using clang, then our tool will work.
Pete says:

September 1, 2010 at 4:02 pm

Interesting article. Quick terminology questions:

1) Isn’t integer overflow in C an implementation defined behaviour (rather than undefined)? On rare occasions I have had to rely on such behaviour.

2) ‘Subtraction overflow’ confused me at first — I’m assuming this is what we normally call underflow?
regehr says:

September 1, 2010 at 4:36 pm

Hi Pete- Signed overflow in C is undefined. In one of my “undefined behavior” blog posts I have a short example program where GCC considers -INT_MIN to be positive in one part of the program and negative in another!

The width of integer types — closely related to overflows — is implementation defined.

When I teach classes, I use underflow (only) to describe the situation where a floating point value becomes so small that it goes to zero. If we agree on this terminology then underflow is towards zero and overflow is away. Actually I just checked wikipedia and — amazingly — it agrees:

http://en.wikipedia.org/wiki/Arithmetic_underflow
Matt Doar says:

September 2, 2010 at 12:42 pm

In my opinion that link to the “interaction with PHPâ€™s main developer” (http://bugs.php.net/bug.php?id=52550) makes you both appear rather immature. You could take away a sixth message from all that: it’s easy for academics to rile developers with an overbearing interaction or two.
regehr says:

September 2, 2010 at 3:32 pm

Hi Matt- OK, I’ll bite: what you would have done differently?

I’ve submitted hundreds of bug reports to open source projects and have learned a few things. For one, it’s good to be a bit humble, but there’s no point in being overly so. For another, it’s OK to piss people off a bit, as long as this motivates them to pay attention to the problem. My main goal is to provide useful data; if there’s some attitude that goes along with this (on either side!) that’s basically just nerds entertaining themselves. The technical part is what’s important. If you read to the bottom of that discussion you’ll see that it had the right ending (I’m talking about Rasmus’s last message, not mine).
Matthew Hackling says:

September 2, 2010 at 3:35 pm

Sounds like you need to be using static analysis to find vulnerabilities and then create proof of concepts. Submit the POC with the code snippet with the vulnerability. This will get attention, it’s responsible disclosure or a friendly kind of blackmail.
dfdfdasfz says:

September 16, 2010 at 12:35 am

Have you run your checker against Splint and Valgrind?
Me says:

September 16, 2010 at 9:21 am

I used to do similar research for a giant soul sucking corporation. You have to be very, very careful in how you interact with developers. You can’t just dump the output from your tool and say, here’s a bunch of bugs, aren’t they obvious? It’s tedious, but you have to build a relationship with the devs by fixing a few bugs yourself, then submitting a few simple bugs for them to fix, and then you can point them to your tool’s output.

Even MSR had this problem. For a long time it was very difficult to get microsoft devs to use PREfast on their code. Once they used it they loved it, but it took an order from Gates to get there. People are difficult, but devs are the worst.
regehr says:

September 16, 2010 at 9:45 am

Hi dfdfdasfz– Comparing with Valgrind is easy: it can’t find any bug that we can find, and we can’t find any bug that it can find :).

Comparing with Splint is a bit apples-oranges since our tool is dynamic like valgrind whereas splint does static analysis.
regehr says:

September 16, 2010 at 9:50 am

Hi Me- I definitely agree with you. In my currently most-interesting bugfinding project we’re submitting lots of bugs to compiler developers. We have to be extremely careful to not overload them with bugs or else they find other things they’d rather do. As long as we report them slowly, they all get fixed.

If I behave differently from how you suggest, it’s because I’m a professor and I’m selling research results, not tools.
Eric Kidd says:

September 16, 2010 at 10:56 am

Argh, I just loathe fiddly little bugs like that. They’re all serious, and they all need to be fixed. But in most cases, developers don’t get live compiler diagnostics, just a big dump of test results from some proprietary analysis tool. And since C integer behavior (or whatever other property people are testing for) is ridiculously subtle, new bugs will inevitably appear before the end of the week.

Realistically, the only way to address these bugs is by using an open tool that developers can add to their build process. This is pretty easy with Valgrind, ElectricFence, etc.: Just add them to an appropriate Makefile target and run them nightly using buildbot. If one of the these tools rejects your code, you say “mea culpa” and fix the bugs immediately.

Realistically, this tool will have much better luck getting traction once people can run these checks using LLVM and trap into the debugger when a failure occurs. With a stack trace and detailed explanation of the bad values, these would be pretty easy problems to debug. In fact, I’d love to be able to run my programs in production with these checks turned on.
regehr says:

September 16, 2010 at 10:59 am

Hi Eric- Of course you’re totally correct. We’ve already started getting this stuff into LLVM. It’s just kind of slow — they’re busy, we’re busy, they’re in the middle of a release cycle, etc. — you know how it is. It won’t be in LLVM 2.8 but should be in 2.9.
Justin Gardiner says:

October 20, 2010 at 6:02 am

1) If you are using Static Analysis “after the fact” then it is already too late
2) using testing to find BUGS and correct them is Ludicrous. If there wasn’t so much literature pointing out the futility of this exercise to gain clean code – I would say you haven’t really done much research here and your findings are flawed.
3) the lack of interest shown by creators of open Source tools really demonstrates why you should stay well away from such tools. they are slap dash hacked together pieces of code purporting to do the same job as commercial tools. Use them at your own peril…. YOU ALWAYS GET WHAT YOU PAY FOR