Csmith 2.1 Released


We’ve released version 2.1 of Csmith, our random C program generator that is useful for finding bugs in compilers and other tools that process C code. The total number of compiler bugs found and reported due to Csmith is now more than 400. All Csmith users should strongly consider upgrading.

New features in this release — all implemented by Xuejun Yang — include:

  • By default, functions and global variables are marked as “static,” permitting compilers to optimize more aggressively in some cases.
  • We now try harder to get auto-vectorizers and other loop optimizers in trouble by generating code that is more idiomatic and therefore more likely to be optimized. In particular, array indices are in-bounds by construction instead of by using % operators.
  • Unions are supported. Generating interesting but conforming uses of unions was not as easy as we’d have hoped.
  • The comma operator is supported, as in x = (y, 1, z, 3).
  • Embedded assignments are supported, as in x = 1 + (y = z).
  • The pre/post increment/decrement operators are supported.
  • A --no-safe-math mode was added, which avoids calling the safe math wrappers. This is useful when trying to crash compilers but the resulting executables should not be run since they are very likely to have undefined behavior.

These features, other than --no-safe-math, are turned on by default. They have found quite a few new compiler bugs; the most interesting is perhaps this one.

An excellent recent development is that Pascal Cuoq, one of the main Frama-C developers, has done a lot of cross-testing of Frama-C and Csmith. I’m not sure what the final score was, in terms of bugs found in Csmith vs. bugs found in Frama-C, but the virtuous cycle has increased the quality of both tools. Frama-C is totally great and I recommend it to people who are serious about writing bulletproof C code.

, ,

5 responses to “Csmith 2.1 Released”

  1. Have you guys looked at breaking graphics drivers yet? Most graphics cards support some form of OpenGL acceleration, and these days this means supporting GLSL, which is a simple C derivative. The reason this is at all interesting is that with WebGL, (a subset of) GLSL is part of the attack surface for web browsers. Break graphics drivers sufficiently bad and you can kernel panic a computer from an HTML page.

  2. Thanks for CSmith – it already found a bug in on of our LLVM back-ends, and we intend to make much use of it.

    A humble feature request: it’d be great to be able to tell CSmith to avoid by-value copies of structs in generated code. For example, things like:

    struct S res = f();

    struct S arg = res;

    g(arg);

  3. Yossi, I have added that feature to our todo list. But can you help me understand why you want it? Some limitation of your backend or target platform?

    The real answer to finding bugs in LLVM backends is to write an LLVM IR fuzzer. It’s on my TODO list.

  4. Yossi Kreinin writes:

    > A humble feature request: it’d be great to be able to tell CSmith to avoid by-value copies of structs in generated code.

    John already answered this specifically, but I wanted to add that I’ve been thinking about how to better support “dialects” in general.

    Right now, Csmith has quite a jumble of one-off options, an approach that doesn’t scale.

    I’ll note that your request isn’t just to enable or disable a particular grammar production; it is about disabling particular choices in particular contexts.