Finding Undefined Behavior Bugs by Finding Dead Code


Here’s a draft of a very cool paper by Xi Wang and others at MIT; it is going to appear at the next SOSP.

The idea is to look for code that becomes dead when a C/C++ compiler is smart about exploiting undefined behavior. The classic example of this class of error was found in the Linux kernel several years ago. The code was basically:

struct foo *s = ...;
int x = s->f;
if (!s) return ERROR;
... use s ...

The problem is that the dereference of s in line 2 permits a compiler to infer that s is not null (if the pointer is null then the function is undefined; the compiler can simply ignore this case). Thus, the null check in line 3 gets silently optimized away and now the kernel contains an exploitable bug if an attacker can find a way to invoke this code with a null pointer. Here’s another example that I like where the compiler silently discarded a security check that invoked undefined behavior. There are more examples in the paper and I encourage people to look at them. The point is that these problems are real, and they’re nasty because the problem can only be seen by looking at the compiler’s output. Compilers are getting smarter all the time, causing code that previously worked to break. A sufficiently advanced compiler is indistinguishable from an adversary.

So the premise of this paper is that if you write some code that executes conditionally, and that code can be eliminated by a compiler that understands how to exploit undefined behavior, then there’s a potential bug that you’d like to learn about. But how can we find this kind of code? One way would be to instrument the compiler optimization passes that eliminate code based on undefined behavior. This has some problems. First, due to complex interactions between optimization passes in a real compiler, it is not at all straightforward to determine whether a particular piece of dead code is that way due to undefined behavior, vs. being dead in a boring way. Chris Lattner’s article about undefined behavior has some good discussion of this issue. The second problem is that if we, for example, instrument LLVM to detect cases where it optimizes away code due to undefined behavior, this doesn’t tell us much about problems that we might run into when using a different compiler, or using the next version of LLVM. In particular, it would be nice to be able to find “time bombs” — undefined behavior bugs that aren’t yet exploited by a compiler, but that might be exploited in the future.

The approach taken in this paper was to develop a custom undefined behavior detector based on static analysis of a single function at a time. It takes LLVM IR and converts it into a SAT instance that is satisfiable when code can be eliminated. This has a couple of advantages. First, by considering one function at a time, the tool can scale to very large programs (note that a single function at the LLVM level may contain inlined code from other functions — this increases precision without slowing things down too much). Second, modern SAT solvers are quite strong, giving the tool the opportunity to find bugs that cannot yet be exploited by production compilers. Third, by writing a standalone tool, problems of being entangled with all of the incidental complexity that is found in a collection of real compiler passes are avoided.

At this point you are probably thinking: “I do not want some stupid tool telling me about every piece of dead code in a huge legacy application, because there are millions of those and almost none of them are bugs.” The solution has two parts. First, the most obvious dead code gets eliminated by running LLVM passes like mem2reg, SCCP, and DCE. Second, the authors have a clever solution where they run their tool twice: first without knowledge of undefined behavior, and second with it. A bug is only reported if the second pass can eliminate code that the first one cannot. Based on this technique, the paper reports a very low rate of false positives.

One thing that is interesting about the tool in this paper is that it has very conservative criteria for reporting a bug: either undefined behavior is invoked on all paths (e.g. it finds a type 3 function) or else it finds code that is made dead by exploiting undefined behavior. Although this conservative behavior could be seen as a disadvantage, in many ways it is a big advantage because the problems that it flags are probably serious, because the code that gets eliminated by the compiler is often a safety check of some sort. The authors were able to find, report, and convince developers to fix 157 bugs in open source programs, which is impressive. They also provide some good anecdotes about developers who could not be convinced to fix undefined behavior bugs, something I’ve also run into.

In summary, by adopting a solid premise (“developers want to know when code they write can be eliminated based on exploitation of undefined behavior”) the authors of this paper have found a way to home in on a serious class of bugs and also to avoid the false positive problems that might otherwise plague an intraprocedural static analysis tool.

UPDATE from August 2013: STACK is available now.


8 responses to “Finding Undefined Behavior Bugs by Finding Dead Code”

  1. “A sufficiently advanced compiler is indistinguishable from an adversary.” Ooh, instant quote material. I’m going to liberally sprinkle that one around whenever appropriate. I hope compiler writers don’t take the corollary to heart (“a compiler that’s distinguishable from an adversary is insufficiently advanced”).

    I’m waiting for the next generation of textbooks to go “Alice wants to securely send a message to Bob. But Carol, a compiler…”

  2. I was very saddened to read that Postgres was the project shamed in the paper for sticking their head in the sand and blaming the compiler rather than fixing their code that invokes UB.

  3. Hi Robby, there aren’t a lot of these but there are a few. As you might guess, they tend to be more applied papers that focus on getting good results on large systems codes.

    Jeroen, I like it. Carol the compiler will figure in my stories from now on.

  4. Ahem, Carol the *C* compiler. When these things happen in languages with decent specifications, it’s always a bug in the compiler.

  5. Fascinating paper, and great article, but I really can’t believe the first example in the article. I was taught nearly twenty years ago to never declare and initialise from another de-referenced variable outside of a control check. It was a basic tenet then, and in my code at least, still is, although I sometimes use a throw/catch mechanism for ease in long lived and variant heirarchies.
    In either case, they can’t be optimised away, and even twenty years ago were considered an acceptable use of cycles, even in a real-time environment. What’s even more annoying in the example, is that the seperation of declaration and initialisation to retain scope and value safely only requires part of the code to move down two lines!
    Having said that, the first example (buffer overflow check) in the paper is one that I have probably used myself in the past and I can imagine this being quite widespread as I seem to remember this trick being taught as it’s not something I would have came up with (It is quite clever in a flat space environment and I don’t think I would have grasped the concept from scratch); and if I have used it it was almost certainly from a tutor or book.

  6. Is there any expectation of when the ‘Stack’ tool is likely to become available, or on what licence terms (LGPL, GPL3, Academic, etc.)?

    It would be nice for it to at least be available to FOSS projects, such as Git and the other tools (both listed and not listed in the paper).