Before a tool such as a compiler is used as a critical component in an important software project, we’d like to know that the tool is suitable for its intended use. This is particularly important for embedded systems where the compiler is unlikely to be as thoroughly tested as a desktop compiler and where the project is very likely to be locked in to a specific compiler (and even a specific version, and a specific set of optimization flags).
Turns out, it’s hard to certify a compiler in any meaningful way. In practice, the most popular way to do it is to purchase a commercial test suite and then to advertise that it was used. For example, the Keil compiler validation and verification page mentions that the Plum Hall C validation suite has been used. This is all good but it’s sort of a minimal kind of certification in the sense that a highly broken compiler can pass this (or any) fixed set of tests. At some level, I think these validation suites are mainly used for two purposes. First, I’m sure they catch a lot of interesting edge-case bugs not caught by vanilla code. Second, using the test suite serves as an advertisement: “We care about details; we have enough money to buy the expensive tests, and we have enough engineers to fix the resulting issues.”
A random test case generator like Csmith can serve as a useful complement to fixed test suites. Whereas Plum Hall’s tests were written by hand and are primarily intended to check standards conformance, a properly designed random tester will deeply exercise internal code generation logic that may not be as strong as it should be.
If Csmith were used as a certification tool, a compiler vendor would get to advertise a fact like “Our compiler successfully translates 1,000,000 tests generated by Csmith 2.1.” This is a high standard; my guess is that no existing C compiler other than CompCert would meet it. Actually, CompCert would fail as well, but in a different sense: it does not handle the entire subset of C that Csmith generates by default.
For several years I’ve gently pushed the idea of Csmith as a compiler certification tool and have gotten some pushback. Random testing makes people uncomfortable just because it is random. For example, it may suddenly find a bug that has been latent for a long time. Traditional test suites, of course, will never do this—they only find latent bugs if you add new test cases (or if you modify the system in a way that makes the bug easier to trigger). People want to avoid unnecessary surprises near the end of a release cycle. This objection, though legitimate, is easy to fix: we can simply specify a starting PRNG seed and then the suite of 1,000,000 random programs suddenly becomes fixed. It retains its stress-testing power but will no longer surprise.
I’ve also received some non-technical pushback about Csmith that I don’t understand as well since people don’t tend to want to articulate it. My inference is that Csmith is a bit threatening since it represents an open-ended time sink for engineers. It’s hard to know ahead of time how deep the rabbit hole of compiler bugs goes. People would rather use their valuable engineers to make customers happy in a more direct fashion, for example supporting new targets or implementing new optimizations.
My hope has been that one high-profile compiler company will take the time to make (for example) 1,000,000 test cases work, and then advertise this fact. At that point, other companies would sense a certification gap and would then be motivated to also use Csmith in a high-profile way. So far it hasn’t happened.
The way that Csmith has found its way into compiler companies is not by top-down fiat (“Use this tool!”) but rather from the bottom up. Compiler engineers—probably the people most concerned with compiler quality—have run it probably mainly out of curiosity at first, have found compiler bugs, and then have kept using it. Unfortunately, due to the nature of this bottom-up process I generally get only indirect, informal indications that it is happening at all.