Csmith Released

Here is Csmith, our randomized C program generator. My dream is that it will be a force for good by unleashing a world of hurt upon low-quality C compilers everywhere (it is not uncommon for Csmith to crash a previously-untested tool on the very first try). High-quality C compilers, such as the latest versions of GCC and Clang, will correctly translate many thousands of random programs.

I wanted to release Csmith under the CRAPL, but nobody else in my group thought this was as funny as I do. Actually we put quite a bit of work into creating a stable, usable release. Even so, Csmith is a complex piece of software and there are certainly some lurking problems that we’ll have to fix in subsequent releases.

April 12, 2011

regehr

Compilers, Software Correctness

10 responses to “Csmith Released”

Yuri Gribov says:

April 13, 2011 at 3:24 am

Thanks a lot for this, John. I wish someone also throws this at commercial compilers out there.
Sam Tobin-Hochstadt says:

April 13, 2011 at 6:28 am

Please don’t release things under the CRAPL. It means that no one can ever use your code in any other piece of free software, without getting your permission. I realize that Matt intends the CRAPL as a joke, but it has consequences.
Ben L. Titzer says:

April 13, 2011 at 12:00 pm

Nice work, John. I am interested in modifying Csmith to generate Virgil programs and start hammering my compiler.
Andreas Zwinkau says:

April 14, 2011 at 2:51 am

Thanks for the release! We already found one subtle bug in cparser and we have several more failing programs to look at.
BCS says:

April 14, 2011 at 7:48 am

What would be more interesting to some would be the set of input program and output results you have generated. This would make it trivially easy for a compiler vendor to test their compiler without having to much with getting X other compilers set up.
Anonymous Cowherd says:

April 14, 2011 at 3:21 pm

@BCS: There’s no such thing as “the set of input program[s] … generated” by Csmith — at least, not in any useful sense. The whole point of random testing is that it does *not* produce a test suite. You might say the tool itself *is* the test suite. Your question implies that John has some set of let’s say 10,000 autogenerated tests, and he’s using them to find bugs. That’s not how it works. Instead, the idea is that you can run Csmith a billion times, and each time it will produce a different autogenerated test. Some of these tests will find bugs, and the vast majority of them won’t.

It would be possible for John to run Csmith 10,000 times, collect the 10,000 resulting C programs into a test suite, and provide that as a “csmith-sample.tar.gz” on his site. Actually, okay, maybe that’s even a good idea! But it might be a mixed blessing, if a compiler vendor downloads only the sample, runs it at only one optimization level, and thereby concludes that (his product is bug-free/Csmith can’t find any bugs in his product). You see, even a 10,000-test test suite would cover only 0.001% of Csmith’s sample space!

That said (and as I said maybe it *is* a good idea after all), I would very much like to see a blog post that shows just one or two examples of Csmith’s output — not as part of a reduced bug report, but from the point of view of the tool’s designer. What specific bad behaviors does it try to elicit? How does it avoid undefined behavior at runtime w.r.t. pointer aliasing, division by zero, null pointers,…? Does it try to exercise common patterns such as “for (i=0; in)”? And then also I’m interested in the styling of the output programs. How much effort did you spend on pretty-printing? Is there heavy use of typedefs, or no use of typedefs? Does it use a lot of macros, or inline functions, or none of the above?

I will at some point get around to downloading Csmith and answering all these questions for myself… but a blog post would be cool. 🙂
Anonymous Cowherd says:

April 14, 2011 at 3:23 pm

@myself: “for (i=0; in)” should read “for (i=0; i < 10; ++i) or for (p=h; p; p=p→n)”. Stupid HTML comments.
Eric Eide says:

April 15, 2011 at 8:27 am

Anonymous Cowherd writes, “What specific bad behaviors does it try to elicit? How does it avoid undefined behavior at runtime w.r.t. pointer aliasing, division by zero, null pointers,â€¦? Does it try to exercise common patterns such as `for (i=0; i<10; ++i)'?"

These issues are addressed in our PLDI '11 paper.

Anonymous also writes, "And then also Iâ€™m interested in the styling of the output programs. How much effort did you spend on pretty-printing?"

Csmith output is minimally pretty :-). Csmith implements basic formatting conventions such as outputting one statement per line. Lines are indented to make block structure apparent. But there is no attempt to place meaningful line breaks in long expressions, say.
regehr says:

April 15, 2011 at 1:58 pm

Hi Sam, I hear you, I was 90% joking.
Flash Sheridan says:

April 21, 2011 at 9:58 am

I agree with at least part of your non-joking 10% about CRAPL. Not-really-working not-quite-released code was a significant problem with some of the tools I cited in my S:PEx complier testing article. (Not Lindig, whose code Just Workedâ„¢.)