A New Compiler Fuzzing Paper

Google Scholar’s recommendation engine has turned out to be a great resource for learning about new papers. Recently this paper about compiler fuzzing turned up in my feed and I was hooked after noticing that they claim to find a lot more compiler bugs during a 12-hour fuzzing run than Csmith can find.

The work in the paper succeeds by addressing a problem that we punted on while developing Csmith: generating expressions that are statically guaranteed to be free of integer undefined behaviors. They accomplish this using static analysis combined with random search. Csmith, in contrast, wraps all math operations with safety checks in order to prevent undefined behaviors dynamically. This is a klunky solution that — as the authors of the new paper note — reduces Csmith’s expressiveness.

The character of the bugs they found follows directly from the generation strategy:

These bugs are, frankly, a little surprising. GCC and LLVM are mature tools and I’d have hoped they were beyond these simple math errors.

Besides the obvious fact that this new tool is generating expressions that Csmith cannot generate, there’s also a more subtle thing going on, which is that any new C code generator is likely to find bugs that Csmith cannot. This is because there are a hundred little choices that get made while generating test cases such as choice of identifiers, choice of function size, how many parentheses to put in and where, etc. Many of these have no effect on bug-finding power but some do, and there’s really no way to know which of them matter in advance. This is one of the most dissatisfying things about random testing.

As far as I can tell, the new compiler testing tool has not been released. Hopefully the authors will release it. If they do release it under an appropriate license, one thing I would consider doing is integrating it into Csmith. This would be easy to do by having Csmith just fail to generate code for a few functions and then generate them using this new tool. At that point we should get the benefits of both tools.

UPDATE: Here are two more C compiler fuzzers I just learned about:

, ,

13 responses to “A New Compiler Fuzzing Paper”

  1. If the authors do not publish their generator, science is seriously broken.

    I hope they will at the very least make available a million or so pre-generated programs.

  2. I’ve hear an even more general form of your last point: the more techniques you use to find bugs, the more bugs you find.

  3. Aren’t they also finding more just because they’re fuzzing things Csmith has fuzzed for a while now? Or is this over any GCC/LLVM, including pre-Csmith versions?

  4. Alex, I’m sure there’s a component of that.

    Of course, now that they’ve started reporting bugs, the playing field will inevitably start to level, because they’ll be fuzzing things that *they’ve* been fuzzing for a while.

    The last paragraph of John’s article makes me feel like the Borg.

  5. Alex, it is absolutely the case that the compilers they tested had had several years in which to develop an immunity to Csmith. The oldest compiler in the comparison against Csmith was from 2010. So it’s not exactly a fair comparison (and they acknowledge this in the paper).

    bcs: definitely.

  6. This “immunity” thing is one aspect that makes me think our random testing was fairly good on the JPL file systems. When we switched to model checking, with a pretty different generation setup, we didn’t find many new bugs. A few things escaped both methods, due to bad assumptions, but there wasn’t the big bump in bugs I expected, suggesting the exploration of the space was fairly thorough. Of course 10KLOC file systems probably have a huge but almost infinitely shallower space than a C compiler.

  7. Nuno, very cool. Historically, instcombine was the the buggiest pass, from the Csmith point of view. Back before it was split into multiple files, it was a real horror :).

    Alex, cool. It’s great to get cross checking between verification methods. Just as a random example why this is good, the other day I thought I was on a roll using Frama-C but really the preprocessor was stripping out all of my annotations. Bleh.

  8. In this paper, all the variables are provided with their values at the top of the program. Would the compiler be able to fold these constant values all the way to the end? If so, then the coverage of this method would seem to be weak — it is only testing the constant-propagation module of the compiler.

    It would seem stronger to have the values passed as command-line arguments to the test programs.

  9. Hi David, Fig 3 shows several volatile variables, if there are enough of these then the compiler won’t be able to do that much folding.