Testing Commercial Compilers


A few weeks ago a reader left this comment:

Just out of curiosity John, have you approached any of the big commercial compiler companies about getting free licenses for their products? I don’t work in the compiler business but if a university research time offered to rigorously test my software, for free, I’d say yes. You can always ignore bug reports after all.

I wanted to devote a post to the answer since it’s something I’ve spent a fair amount of time thinking about. Let’s look at the tradeoffs.

Pros of Testing Commercial Compilers

The whole reason I started working on this was the fact that bugs in commercial compilers for embedded systems were irritating me and the students in my embedded systems courses. So of course I want these compilers (and “regular compilers” too) to be more correct.

Finding bugs in commercial compilers benefits my project by showing that our work is generally useful. If we only found bugs in open source compilers, people could just explain that these tend to be much buggier than commercial tools. This is false, though it doesn’t stop people from occasionally saying it anyway; for example, see Chris Hills’ posts here. In any case, as a proof of concept it is important for Csmith to find bugs in the highest-quality compilers that are available, regardless of who produces them.

Cons of Testing Commercial Compilers

First, reporting compiler bugs is hard work. It’s not my job to improve a product that someone else is making money by selling. If someone wants me to do this, they can hire me as a consultant. If they want to do the work on their own, Csmith is open source under a corporation-friendly license.

Second, the people producing these compilers may not have any interest in having me test it. There is, in fact, a potentially serious downside: if I publicize the fact that I found a lot of bugs in safety-critical compiler A, then the people producing compiler B can use this as a selling point. I’ve noticed that sometimes people steeped in the open source world have a difficult time believing this is a serious concern, but it is.

Third, random testing works best when the bug reporting / fixing feedback cycle is rapid. Speed is important because high-probability bugs tend to mask lower-probability bugs. As a tester operating outside of the company, I’m tied to a months-long release cycle as opposed to just updating to the latest revision every morning.

Fourth, commercial tools have a closed process. I can’t see the full set of bug reports, I can’t access the source code or commit log, and perhaps I cannot even mail the developers directly.

Fifth, commercial compilers are often a pain. They probably use the despicable FlexLM. They often run only on Windows. They probably have obscure command-line options. They may target platforms for which I don’t have good simulators.

Finally, the purpose of a commercial compiler is simple: it is to make money for the company selling it. Given finite engineering resources, a company may well be more interested in supporting more platforms and improving benchmark scores than it is in fixing bugs. If they do fix bugs, they’ll begin with the ones that are affecting the most valuable customers.

Conclusion

In contrast with the points above, acting as a test volunteer for open source compilers is a pleasure. I can see the code and directly interact with developers. Bugs are often fixed rapidly. More important, when these compilers are improved it benefits everyone, not just some company’s customers. I feel a substantial sense of obligation to the open source community that produced the systems I use every day. It seems crazy but I’ve had a Linux machine on my desk continuously for a bit more than 19 years.

I’ve tested a number (>10) of commercial C compilers. Each of them has been found to crash and to generate incorrect output when given conforming code. This is not a surprise — compilers are complicated and all complex software contains bugs. But typically, after this initial proof of concept, I stop testing and go back to working on GCC and LLVM. I’ve heard through the grapevine that Csmith is in use at several compiler companies. The engineers doing this have not shared any details with me and that is fine — as long as the bugs are getting fixed, my project’s goals are being met.


6 responses to “Testing Commercial Compilers”

  1. Thanks for writing this up John; it was very helpful.

    My original question was more about why compiler vendors aren’t coming to you though. It seems that your team is in a unique position because you can say “your compiler tends to differ from competitors in these ways” — that seems like a really useful service. If vendors spend a lot of time dealing with customer complaints of the form “our code works on competitor’s compiler X but fails/does-something-absurd with your Y”, information gleaned from your testing could save them a lot of effort: even poorly minimized Csmith cases are probably smaller and easier to understand than a large customer project, which the customer may not even be willing to share.

    Now, obviously, given the costs that you’ve described above, it would only make sense if commercial compiler vendors were willing to pay you. Moreover, this only really makes sense if you can get competing commercial compilers participating; comparing gcc/llvm/icc and some commercial embedded compiler isn’t really useful.

  2. Hi msalib, I don’t know why they’re not coming to me, but keep in mind that I have not advertised any kind of product or service :).

    My sense is that most complaints of the form “our code works on competitor’s compiler X but fails/does-something-absurd with your Y” probably have to do with non-standard behaviors — special hacks for I/O devices, inline assembly, and that kind of thing — as opposed to the kinds of bugs that Csmith discovers.

    Back when I started this project I envisioned a compiler “correctness benchmark” where I publish for example the results of 1,000,000 Csmith tests on many different commercial compilers. But there are a lot of practical difficulties in doing this without explicit cooperation from the vendors, and there are a lot of practical difficulties in getting them to cooperate, like for example they probably only want to help out if they’re guaranteed to look better than their competitors.

    So anyway, we’re making Csmith open source and hoping it gets into the compiler companies through the back door, via engineers who just want to make a stronger product. These are the people I understand and sympathize with anyway. I don’t know how to sell things to the companies’ front-ends.

  3. Regarding the bit where people (falsely) claim commercial compilers to be less buggy than open source ones: data to support the fact that this is indeed false would benefit open source compilers tremendously. I’m targeting embedded platforms for automotive safety apps, and it’s my impression that gcc seems more risky to customers than a bug-infested proprietary compiler – for that matter, zlib seems more risky than slower proprietary image decompression code with poorer compression rate.

    In this sense, finding and documenting bugs in proprietary compilers can greatly benefit everyone by making it easier to use open source compilers.

  4. Hi Yossi, I agree — something like this would be very valuable! However, as I was implying in comment #2, it’s not clear that doing this in an adversarial way would be useful.

    What I’d like to see happen is some compiler company starts to advertise “our tool successfully translated 1e6 programs generated by Csmith, and our competitors cannot make this claim.” At that point, the competitors would most likely do the right thing.

  5. Hi Yossi,
    I’m obviously biased, as I work for one of the commercial compiler vendors. (Not in development) I’m not going to feed the open source/commercial quality war, but just comment on your observation about customers with high integrity requirements preferring a commercial compiler; it’s not really that strange – it’s all about accountability and who to blame when something is wrong with the tool chain.
    There are plenty of support service solutions available in the open source world; but for some reason they have not really adapted to the needs of high-integrity customers and I think the main reason is that the support companies that work with open source tools are not willing to put up with the sometimes very demanding requirements of this type of customers…
    Especially since it for example might require keeping special bug-fix branches alive (and updated) for 10-20 years.

    Another big reason is the requirements on “tool validation” put forward by iec61508 and iso26262. Especially the iso26262 standard calls for quite heavy measures in the customer project unless the vendor can show satisfying documentation on their development and test processes. There is simply no legal entity that can stand up to these requirements for gcc for example. And due to the internal dynamics of the open source community I doubt that there ever will be.

    But nothing is impossible… 🙂