My student Peng Li modified Clang to detect integer-related undefined behaviors in C and C++ code. We’ve released the code here, to go along with the recent LLVM 2.9 release. This checker has found problems in PHP, Perl, Python, Firefox, SQLite, PostgreSQL, BIND, GMP, GCC, LLVM, and quite a few other projects I can’t think of right now.
Of course, undefined behaviors are not all created alike. The next thing that someone should do is combine this work with a tainting engine in order to find dangerous operations that depend on the results of operations with undefined behavior. For example, take this code:
a[1<<j]++;
If it is ever the case that 0>j≥32, then a shift-related undefined behavior occurs. The result of this operation is used as an array index — clearly this is dangerous. It may even be exploitable, depending on what a particular C/C++ implementation does with out-of-bounds shifts. Other dangerous operations include memory allocations and system call arguments.
One of my favorite examples of a project getting hosed by an integer undefined behavior is here. An apparently harmless refactoring of code in Google’s Native Client introduced an undefined behavior that subverted the fundamental sandboxing guarantee. This is pernicious because the (flawed) check is sitting right there in the source code, it is only in the compiler output that the check is a nop. If their regression tests used our tester, this problem would have almost certainly been caught right away.
10 responses to “Finding Integer Undefined Behaviors Using Clang 2.9”
You mean “jj≥32”.
Thanks for the sources. This will be fun to try on all the very well-tested and entirely correct code that I have lying around.
Sorry, my comment was mangled somehow. I wanted to write that you probably meant “j < 0 or j ≥ 31”.
Mattias, 1<<31 is well-defined -- I meant j ≥ 32.
1 has the type signed int, so 1<<31 overflows.
True, but only in C99.
I cannot find where this is defined in C89 (§3.3.7 seems to indicate that it isn’t), but I suppose you have done a closer reading of the standards.
I consider it to be well-defined because 3.3.7 of the ANSI C standard fails to make it undefined, whereas C99 is clear on this point.
These arguments are annoying. The time for executable standards is long past.
Sorry, it was an honest question. (3.3.7 fails to make it defined, which would make it undefined. But then I found A.6.3.4, which patches it up and allows it to be implementation-defined.)
Hi Mattias, I’m annoyed at the standard (or standard authors), not you!
Thank you, and I have now found DR#081 which made it completely clear.