Integer Undefined Behavior Detection using Clang 3.3


Undefined behaviors in C/C++ are harmful to developers:

  • There are many kinds of undefined behavior
  • They can be hard to understand
  • Their effect changes depending on which compiler version you use, which compiler options you use, and they get worse every time an optimizer gets smarter
  • Plenty of them aren’t reliably detected by any tool that I know of

Until these languages die, which isn’t going to happen anytime soon, our best defense against undefined behaviors is to write better checking tools. Recently, Clang has started to accumulate a nice collection of such tools, many of which can be enabled using the compiler flag -fsanitize=undefined. The Clang manual has more details.

Our modest contribution has been a collection of checks for integer undefined behaviors like signed overflow and shift-past-bitwidth. These checks have been part of LLVM for a while and finally now they are part of the 3.3 release which comes in variety of convenient pre-compiled packages.

To find integer undefined behaviors in a code base that you are about, there are three steps:

  1. Install Clang/LLVM 3.3 from a binary package or from source and make sure that clang and clang++ are in your PATH. If compiling from source, you will need to build compiler-rt. Full instructions are here.

  2. Build your code base using clang or clang++ and a flag such as -fsanitize=undefined

  3. Test the compiled code as thoroughly as possible; if any sanitizer output appears then you have probably found one or more bugs. The lines you care about will contain the string “runtime error”.

Let’s go through a quick example. I did this on a Linux machine but it should be more or less the same on other platforms. Grab the latest stable version of Perl, untar it, and run its configure script:

wget http://www.cpan.org/src/5.0/perl-5.18.0.tar.gz
tar xvf perl-5.18.0.tar.gz
cd perl-5.18.0/
./Configure

When the configure script asks for a C compiler, respond with clang -fsanitize=undefined. Then build Perl, run its test suite, and look for problems:

make -j4
make test > make.out 2>&1
grep 'runtime error' make.out | sort | uniq

At this point you should see several hundred lines of undefined behavior errors. Here’s the full output.

Since modern C compilers actually exploit undefined integer behaviors in order to generate code that you didn’t expect or want, this whole exercise is probably worth doing for codes whose correctness you care about.

,

8 responses to “Integer Undefined Behavior Detection using Clang 3.3”

  1. Indeed, it looks as if the test output shows that it is detecting float overflows as well. Presumably these are the results of casts, rather than arithmetic. (My very shaky memory tells me that arithmetic overflows on floats just produce NaNs…)

  2. Hi Michael, I believe that your guess is correct. It’s not totally clear to me that FP checks should be turned on by the catchall undefined behavior detection flag but it’s not something I think about much either.

  3. Interesting tool!

    Regarding applying it to Perl: I’m certain it found a few genuine bugs (regexp compilation using I32’s as offsets?), but others are pretty much what one would expect. For example the error at “pp.c:2527:7” is in the opcode that implements a special case of integer arithmetic: A (Perl) user needs to explicitly enable “use integer” mode to get the machine’s raw integer arithmetic. In this mode, perl doesn’t prevent the user from doing the wrong thing. The test which provokes the undefined behaviour is specifically testing the corner cases. One could argue it’s therefore a bug in the test script.

    In any case, this seems a very useful tool. Thank you!

  4. Hi Steffen, thanks for the comments, I’d b very interested to hear how many of these end up being useful to the Perl developers.

  5. @Michael Norrish: You are right, on most systems, the default is that an arithmetic overflow produces ±∞. However, it is possible to set the system to trap on overflow. Also, in most programs, ±∞ are undesirable and checking for them is thus a worthy endeavour (as done in the Astrée system, for instance).

    Overflow on conversion to integer, if I remember correctly, is an undefined behavior.

  6. @John: Thanks for all your work, since it’s now integrated into clang… We can therefore use the PAGAI static analyzer to check for the reachability of the trap condition!