[Note: I promise, this is almost my last post about undefined behavior for a while. Maybe just one more in the queue.]
The current crop of C and C++ compilers will exploit undefined behaviors to generate efficient code (lots of examples here and here), but not consistently or well. It’s time for us to take this seriously. In this piece I’ll show that aggressively exploiting undefined behavior can massively speed up real codes, while remaining 100% faithful to the language standards.
Since undefined behavior is tricky to talk about, I’ll need to lay a bit of groundwork.
Understanding a C or C++ Program
The meaning of a C or C++ program is determined by an “abstract machine” that is described in the standards. You can think of these abstract machines as the simplest possible non-optimizing interpreters for C and C++. When the abstract machine runs a program, it does so as a sequence of steps, each of which is stated or implied by the standard. We’ll call this sequence of steps an execution. In general, a program will admit many different executions. For example, if we use gzip to compress the Gettysburg Address, the C abstract machine will perform a few million steps which together constitute one possible execution of gzip. If we compress the Gettysburg Address again, we’ll get the same execution (in this post we’ll ignore the fact that these abstract machines are non-deterministic). If we compress “I Have a Dream” then we’ll get a different execution. It is useful to talk about the set of all possible executions of a program. Of course, this set often has a large or infinite number of members.
The Compiler’s Job
The compiler’s job is to emit object code that has the same observable behavior as the C/C++ abstract machine for all non-erroneous executions of the program. Observable behavior is pretty much just like it sounds: it covers the abstract machine’s interactions with hardware (via volatile variables), the operating system, etc. Internal operations such as incrementing a variable or calling malloc() are not considered to be observable. The compiler is free to generate any code it likes as long as observable behaviors are preserved. Consequently, the compiler does not need to translate code that is not touched by any execution of the program. This is dead code elimination, a useful optimization.
Now let’s work undefined behavior into the picture. Every execution step performed by the C/C++ abstract machine is either defined or undefined. If an execution of a program contains any undefined steps, then that execution is erroneous. The compiler has no obligations at all with respect to erroneous executions. Notice that “erroneous” is an all-or-nothing property; it’s not the case that an execution is non-erroneous up to the point where it executes an operation with undefined behavior. (I wrote about this issue in more detail here.)
As a thought experiment, let’s take an arbitrary C or C++ program and add a new line of code at the start of main():
To evaluate this expression, the abstract machine must perform an operation that (in the C99, C11, and C++11 dialects) has undefined behavior. Every execution of the program must evaluate this expression. Thus, the program admits no non-erroneous executions. In other words, the compiler has no obligations. An efficient course of action for the compiler would be to report success without generating any object code.
What if we put the undefined execution at the end of the program, just before main() returns? Assuming that the program contains no calls to exit() or similar, this has exactly the same effect—the compiler can report success and exit without generating any code.
Do Real Codes Have Lots of Erroneous Executions?
Let’s consider the SPEC benchmark programs. Making them fast is something that most compiler vendors would like to do. Using IOC we found that a majority of the programs in SPEC CINT 2006 execute undefined behavior during the course of a “reportable” run. This means that a correct C/C++ compiler could generate code that is almost infinitely faster than the code generated by current compilers. The SPEC harness would of course complain that the compiled programs produce the wrong results, but this is actually a bug in the SPEC harness, not a bug in the compilers. If we insist that a compiler turns these SPEC programs into executables that produce the specified results, then we are effectively mandating that this compiler implements a bizarre, undocumented dialect of C where undefined behavior can sometimes be exploited and sometimes not.
If we consider them to be written in C99, LLVM/Clang 3.1 and GCC (SVN head from July 14 2012), running on Linux on x86-64, do not appear to admit any non-trivial, non-erroneous executions. I expect the same is true of most or all previous versions of these compilers. IOC tells us that these compilers execute undefined behaviors even when compiling an empty C or C++ program with optimizations turned off. The implication is that a suitably smart optimizing compiler could create executables for Clang and GCC that are dramatically faster than the executables we use every day. Think how fast you could build a Linux kernel with one of these compilers!
My guess is that most or all executions of GCC and LLVM (and basically all other large programs) would be undefined by C89 (and not just C99) but we can’t tell for sure without using a comprehensive undefined behavior checker for C89. This tool does not exist. I confess to not personally having the gumption necessary for cramming GCC or LLVM through the best available dynamic undefined behavior checkers: KCC and Frama-C.
How Optimizing Compilers Should Work
Compilers are perfectly capable of reasoning about all possible executions of the program being compiled. They do this with static analysis, which avoids decidability problems using approximation. In other words, detecting dead code is perfectly possible as long as we accept that there will sometimes be dead code that we cannot statically detect. The resulting optimization, dead code elimination, is ubiquitous in optimizing compilers. So how can we leverage static analysis to aggressively exploit undefined behavior?
Step 1: Make Undefined Behavior Explicit in the IR
LLVM already has a limited form of explicit undefinedness: an “undef” value. This is useful but it should be augmented with an undef instruction that represents unconditional execution of undefined behavior.
[Update: Owen Anderson points out in the comments that LLVM’s unreachable instruction already serves this purpose! Cool. Also, the undef value has stronger semantics than C/C++ undefined behavior so either its meaning would need to be tweaked or else a new value would need to be added before the undef to unreachable promotion I mention below would work.]
Step 2: Turn Dataflow Facts into Undefined Behaviors
Compilers already turn undefined behaviors into dataflow facts. For example, in the famous example from the Linux kernel, GCC used a pointer dereference to infer that the pointer was not null. This is all good, though there are plenty of instances where compilers could be much more aggressive about this; Pascal has a nice example involving restrict.
On the other hand, not as much has been done on aggressively inferring undefined behavior using dataflow facts. For example:
- If a program evaluates x+y and we know that both x and y are larger than 1.1 billion (and both variables are 32-bit ints), the addition can be replaced with an undef value.
- If a program evaluates (*x)++ & (*y)++ and the pointer analysis can show that x and y are aliases, the expression can again be turned into an undef value.
- Any statement that unconditionally evaluates an expression containing an undef value can be turned into an undef instruction.
There are around 191 kinds of undefined behavior in C99, so a complete set of optimization passes looking for them will require a fair amount of work.
Step 3: Prune Undefined Executions
Now we want to delete any code that is only needed by erroneous executions. This will build upon dominator analysis, which is already done by optimizing compilers. A piece of code X dominates Y if you can only get to Y by first executing X. All code that is dominated by an undef instruction can be removed. The opposite is not true: code can’t be deleted just because it dominates an undef. However, there exists a reversed dominator analysis called post dominator where X post-dominates Y when all executions going through X must also go through Y afterwards. Using the results of this analysis, we can safely delete any code that post-dominates an undef.
Dominator-based undefined behavior optimization will be particularly effective when used interprocedurally at link time. For example, if I write a function that unconditionally uses an uninitialized variable, the intraprocedural analysis will reduce that function to a single undef instruction, but the interprocedural optimization will potentially be able to destroy much more code: everything leading up to, and away from, calls to this function, across the whole program. Imagine what could be accomplished if a common C library function such as malloc(), in a widely deployed version of libc, could be shown to unconditionally execute undefined behavior: almost all C programs on that system would totally disappear. A real Linux distribution could be fit onto a couple of floppy disks, like in the old days.
The big picture is something like this:
Let me be clear: compilers already try to avoid generating code corresponding to erroneous executions. They just don’t do this very well. The optimizations that I’m proposing differ in degree, not in kind.
Perhaps I’ll get a student to implement the more aggressive optimizations as a proof of concept. We’ll impress everyone with 2-second builds of the Linux kernel, 3-second GCC bootstraps, and 99.9% code-size reductions for Firefox.