This post is for fun, no deep thoughts will be presented. For a research project that’s not yet ready to write up, I needed a bunch real programs (as opposed to, for example, programs generated by Csmith) that cause compilers to crash. So I built a few hundred randomly chosen revisions of GCC and LLVM/Clang and then built about 1,700 packages from source on a 64-bit Ubuntu box using a “compiler interceptor” — a collection of aliases such as “gcc” and “g++” that lead to a script that eventually runs the intended compiler but first invokes a few of the random compiler versions. If one of the random compilers crashes, the interceptor saves enough information to reproduce the problem: basically the compiler command line and the preprocessed source file. Not counting a large number of trivial invocations by autotools, the compiler interceptor was called about 274,000 times, resulting in about 4,000 reproducible compiler crashes (about 2,000 crashes were not considered usefully reproducible by my scripts because, for example, gcc was being used as a linker or assembler).
Modern build systems are not very pretty and consequently compiler command lines are fat and redundant: during my compiling party, the compiler was invoked with a command lines up to about 20 KB long. To find out which arguments were leading to compiler crashes, I wrote a little delta debugger that iteratively attempted to remove arguments until a fixpoint was reached under the rule that we can only remove arguments that don’t make the crash go away. Additionally, this program attempted to downgrade the optimizer flags, for example turning -Os and -O3 into -O2, -O2 into -O1, etc. The tables below show the results ranked by number of occurrences in the ~4000 reproducible crashes.
When reading the results, keep a few things in mind:
- Every option that is listed is actually needed to make the compiler crash. Why would -Wall induce a crash? I didn’t look into it but there must be a bug in the code emitting warnings.
- These results don’t reflect the behavior of the current versions of GCC and LLVM/Clang, but rather the range of revisions that I tested. These are roughly GCC 4.0-current and LLVM/Clang 2.6-current. As the numbers show, clang++ was an extremely immature compiler early on; this of course doesn’t imply anything about the current quality of this compiler.
gcc
# crashes | options |
---|---|
12 | -O1 |
9 | -O3 |
8 | -Wall -O2 |
6 | -Os -g -fno-asynchronous-unwind-tables |
4 | -O2 |
1 | -std=gnu99 -O1 |
1 | -std=gnu99 -O3 |
1 | (none) |
g++
# crashes | options |
---|---|
147 | (none) |
13 | -O1 |
3 | -std=c++0x |
2 | -O3 |
clang
# crashes | options |
---|---|
644 | -mavx -O2 |
206 | (none) |
152 | -mavx |
63 | -mavx -O1 |
58 | -fexceptions |
5 | -fgnu89-inline -fexceptions |
2 | -O1 -fPIC |
2 | -fgnu89-inline -fPIC |
1 | -fgnu89-inline -g -fPIC |
1 | -fPIC |
1 | -O1 |
1 | -O3 |
1 | -Wall |
clang++
# crashes | options |
---|---|
2829 | (none) |
8 | -std=c++0x |
4 | -std=c++0x -g |
1 | -Wall -std=c++0x |
1 | -g -O1 |
1 | -g |
1 | -O1 |
One thing that comes to mind while looking at these results is that I’d have expected -O3 to show up more often.
2 responses to “Crashy Compiler Flags”
Speaking for our cparser/libfirm compiler: -O3 is the default flag to run our regression tests. Benchmarking (-O3) is more important than development or debug builds (-O1 etc), so they get test more often. Nowadays, we have buildbot farm, which runs the testsuite with various arguments, so the situation has improved, but -O3 is probably still the most tested case.
The most obvious thing I notice is that C compilers don’t crash much (ok, maybe early clang) unless optimization is going on, while C++ is fairly “hard” to compile even if you don’t try to make the code fast. This isn’t exactly surprising, I guess.