A New Compiler Fuzzing Paper

Google Scholar’s recommendation engine has turned out to be a great resource for learning about new papers. Recently this paper about compiler fuzzing turned up in my feed and I was hooked after noticing that they claim to find a lot more compiler bugs during a 12-hour fuzzing run than Csmith can find.

The work in the paper succeeds by addressing a problem that we punted on while developing Csmith: generating expressions that are statically guaranteed to be free of integer undefined behaviors. They accomplish this using static analysis combined with random search. Csmith, in contrast, wraps all math operations with safety checks in order to prevent undefined behaviors dynamically. This is a klunky solution that — as the authors of the new paper note — reduces Csmith’s expressiveness.

The character of the bugs they found follows directly from the generation strategy:

These bugs are, frankly, a little surprising. GCC and LLVM are mature tools and I’d have hoped they were beyond these simple math errors.

Besides the obvious fact that this new tool is generating expressions that Csmith cannot generate, there’s also a more subtle thing going on, which is that any new C code generator is likely to find bugs that Csmith cannot. This is because there are a hundred little choices that get made while generating test cases such as choice of identifiers, choice of function size, how many parentheses to put in and where, etc. Many of these have no effect on bug-finding power but some do, and there’s really no way to know which of them matter in advance. This is one of the most dissatisfying things about random testing.

As far as I can tell, the new compiler testing tool has not been released. Hopefully the authors will release it. If they do release it under an appropriate license, one thing I would consider doing is integrating it into Csmith. This would be easy to do by having Csmith just fail to generate code for a few functions and then generate them using this new tool. At that point we should get the benefits of both tools.

UPDATE: Here are two more C compiler fuzzers I just learned about:

ccg — can only find crash bugs, but has a very impressive record already
icFuzz — unfortunately this is an internal Intel tool

November 12, 2013

regehr

Compilers, Computer Science, Software Correctness

13 responses to “A New Compiler Fuzzing Paper”

Pascal Cuoq says:

November 12, 2013 at 4:06 pm

If the authors do not publish their generator, science is seriously broken.

I hope they will at the very least make available a million or so pre-generated programs.
regehr says:

November 12, 2013 at 4:29 pm

Pascal, the authors have released previous tools, so perhaps there’s hope. Various stuff can be found here:

http://ist.ksc.kwansei.ac.jp/~ishiura/pub/
Alex Groce says:

November 12, 2013 at 5:18 pm

What command line options do they support and can we get them to swarm them is my first question.
regehr says:

November 12, 2013 at 5:33 pm

Alex, good question! I’ve asked the PI if he’ll share the software. If so, we’ll see.
bcs says:

November 12, 2013 at 6:00 pm

I’ve hear an even more general form of your last point: the more techniques you use to find bugs, the more bugs you find.
Alex Groce says:

November 12, 2013 at 6:02 pm

Aren’t they also finding more just because they’re fuzzing things Csmith has fuzzed for a while now? Or is this over any GCC/LLVM, including pre-Csmith versions?
Eric Eide says:

November 12, 2013 at 7:43 pm

Alex, I’m sure there’s a component of that.

Of course, now that they’ve started reporting bugs, the playing field will inevitably start to level, because they’ll be fuzzing things that *they’ve* been fuzzing for a while.

The last paragraph of John’s article makes me feel like the Borg.
regehr says:

November 12, 2013 at 9:01 pm

Alex, it is absolutely the case that the compilers they tested had had several years in which to develop an immunity to Csmith. The oldest compiler in the comparison against Csmith was from 2010. So it’s not exactly a fair comparison (and they acknowledge this in the paper).

bcs: definitely.
Alex Groce says:

November 13, 2013 at 12:32 am

This “immunity” thing is one aspect that makes me think our random testing was fairly good on the JPL file systems. When we switched to model checking, with a pretty different generation setup, we didn’t find many new bugs. A few things escaped both methods, due to bad assumptions, but there wasn’t the big bump in bugs I expected, suggesting the exploration of the space was fairly thorough. Of course 10KLOC file systems probably have a huge but almost infinitely shallower space than a C compiler.
Nuno Lopes says:

November 13, 2013 at 3:08 pm

Shameless plug: Last week I gave a talk at LLVM’s dev meeting that could actually be used to erradicate all these instcombine bugs once and for all. Of course testing is still highly relevant to find other kind of bugs.
Current SMT solvers are more than capable of verifying things like instcombine, DAG combiner, Legalize DAG, and so on.

The slides: http://web.ist.utl.pt/nuno.lopes/pres/llvm-devconf13.pptx
regehr says:

November 14, 2013 at 1:28 pm

Nuno, very cool. Historically, instcombine was the the buggiest pass, from the Csmith point of view. Back before it was split into multiple files, it was a real horror :).

Alex, cool. It’s great to get cross checking between verification methods. Just as a random example why this is good, the other day I thought I was on a roll using Frama-C but really the preprocessor was stripping out all of my annotations. Bleh.
David Harris says:

November 14, 2013 at 2:38 pm

In this paper, all the variables are provided with their values at the top of the program. Would the compiler be able to fold these constant values all the way to the end? If so, then the coverage of this method would seem to be weak — it is only testing the constant-propagation module of the compiler.

It would seem stronger to have the values passed as command-line arguments to the test programs.
regehr says:

November 14, 2013 at 2:42 pm

Hi David, Fig 3 shows several volatile variables, if there are enough of these then the compiler won’t be able to do that much folding.