GCC pre-4.8 Breaks Broken SPEC 2006 Benchmarks

UPDATE: Ok, I did something stupid. I wrote most of this post while using a pre-4.8 snapshot that I had sitting around and then I built the actual 4.8.0 release to verify the behavior but I screwed up a path or something. The 4.8.0 release does not have the behavior described in this post. It was patched a few days before the release, see comments 9 and 10 below. I apologize for the sloppiness. Of course, SPEC is still broken and there are still plenty of undefined programs not in SPEC that GCC will break for you…

Near the top of the GCC 4.8.0 release notes we find this:

GCC now uses a more aggressive analysis to derive an upper bound for the number of iterations of loops using constraints imposed by language standards. This may cause non-conforming programs to no longer work as expected, such as SPEC CPU 2006 464.h264ref and 416.gamess. A new option, -fno-aggressive-loop-optimizations, was added to disable this aggressive analysis.

The thing that is happening here—a compiler breaking a previously working program by exploiting undefined behavior—is a topic I’ve written plenty about, but breaking the current version of SPEC seems like a big enough deal to make it worth looking into. If you want the goriest possible details, check out some discussion at the GCC bugzilla and the response from SPEC. Here I’ll just present the essentials.

The problem with h264ref is a function that contains code like this:

int d[16];

int SATD (void)
  int satd = 0, dd, k;
  for (dd=d[k=0]; k<16; dd=d[++k]) {
    satd += (dd < 0 ? -dd : dd);
  return satd;

Using GCC 4.8.0 on Linux on x86-64, we get this (I’ve cleaned it up a bit):

$ gcc -S -o - -O2 undef_gcc48.c
	jmp	.L2

Of course this is an infinite loop. Since SATD() unconditionally executes undefined behavior (it’s a type 3 function), any translation (or none at all) is perfectly acceptable behavior for a correct C compiler. The undefined behavior is accessing d[16] just before exiting the loop. In C99 it is legal to create a pointer to an element one position past the end of the array, but that pointer must not be dereferenced. Similarly, the array cell one element past the end of the array must not be accessed.

One thing we might ask is: What is the compiler thinking here? Why not just give a nice warning or error message if the program is going to unconditionally execute undefined behavior? The basic answer is that the compiler isn’t somehow maliciously saying to itself: “Aha– undefined behavior. I’ll screw up the object code.” Rather, the compiler is simply assuming that undefined behavior never occurs. How does that come into play in SATD()?

In detail, here is what’s going on. A C compiler, upon seeing d[++k], is permitted to assume that the incremented value of k is within the array bounds, since otherwise undefined behavior occurs. For the code here, GCC can infer that k is in the range 0..15. A bit later, when GCC sees k<16, it says to itself: “Aha– that expression is always true, so we have an infinite loop.” The situation here, where the compiler uses the assumption of well-definedness to infer a useful dataflow fact, is precisely analogous to the null-pointer check optimization which got some Linux kernels into trouble.

My view is that exploiting undefined behavior can sometimes be OK if a good debugging tool is available to find the undefined behavior. In this case, a memory safety error, we’ll turn to Valgrind and the address sanitizer (an awesome new feature in GCC 4.8.0, ported over from LLVM). Here goes:

$ gcc undef_gcc48.c
$ valgrind -q ./a.out
$ gcc -fsanitize=address undef_gcc48.c
$ ./a.out 
$ clang -fsanitize=address undef_gcc48.c
$ ./a.out 

No dice, bummer. Valgrind cannot change the layout of objects so I would not have expected it to notice this, but it seems like the address sanitizer should be able to catch the problem. On the other hand, where dynamic analysis fails us, static analysis succeeds:

$ gcc -O3 undef_gcc48.c -Wall 
undef_gcc48.c: In function ‘SATD’:
undef_gcc48.c:6:29: warning: array subscript is above array bounds [-Warray-bounds]
   for (dd=d[k=0]; k<16; dd=d[++k]) {
undef_gcc48.c: In function ‘main’:
undef_gcc48.c:6:29: warning: array subscript is above array bounds [-Warray-bounds]
   for (dd=d[k=0]; k<16; dd=d[++k]) {

For whatever reason, this warning shows up at -O3 but not -O2. We should, however, not pat ourselves on the back quite yet—this kind of static analysis is generally somewhat brittle and we cannot expect to always get such a nice warning.

It would be reasonable to ask the larger question of whether a compiler should be breaking code like this, but it’s late to be opening that can of worms, so I’ll leave this discussion for another day (or for the comments). I have not looked into 416.gamess but I suspect it’s more of the same.

29 replies on “GCC pre-4.8 Breaks Broken SPEC 2006 Benchmarks”

  1. It would be nice if compilers would say “Hey, I compiled your program into an infinite loop. But looking back at your original code, syntactically there was a loop condition, so maybe this isn’t quite what you wanted?”

  2. Quoting from the bug: “It certainly reads memory past
    the end of an array, and so assigns an undetermined value to the variable dd.”

    What more is there to say? I can’t believe that even professionals don’t know what undefined behavior means. At the very least they must assume that an AV is possible reading off the end of the array. They just mentally tune this out.

  3. Tobi, the value of d[16] is never used – the loop exits after it is read; some compilers would eliminate the read. It is a bug in the code though, but is the right logic in a compiler to assume seemingly buggy code is not executed and replace it with nop?

    The compiler should be more helpful IMHO. Either warn (without the need to amplify warning settings) when it discovers this (so warning only at -O3 would be OK) or compile the code with bad behaviour as is – the programmer may know that this array is stored inside a larger object so it’s never a problen in practice.

    The compiler shouldn’t be a smart-ass adversary though.

  4. “Aha– undefined behavior. I’ll screw up the object code.”

    That’s exactly what it’s doing. No thanks.

  5. I cannot reproduce the issue
    with the final version that was released
    gcc version 4.8.0 (GCC)

  6. “It would be reasonable to ask the larger question of whether a compiler should be breaking code like this”

    The compiler isn’t breaking any code – the code is already broken. It invokes undefined behaviour and you get what you asked for.

  7. Autism is a hell of a drug. Maybe if you don’t try to be such a smartass with your code to the point where I can’t figure out what it’s supposed to do, you won’t run into those kinds of problems.

    Succinct code is important, but this is just taken to a whole new level of pretentious idiocy.

  8. Please note that the situation has been significantly improved on March 14 when the bugfix for GCC PR 53265 [1] has landed on trunk (4.8 release includes that). Now GCC is able to emit a warning (“iteration 15 invokes undefined behavior”), and actually avoids “breaking” the code after emitting a warning.

    [1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53265

  9. Alexander and Slothrop, thanks! I have added an update to the post.

    Slothrop, are you still having trouble reproducing this behavior? If so, please send me the output of “gcc -v; gcc -S -O2 undef_gcc48.c -o -“

  10. The PR53265 commit actually made it into GCC 4.8.0 release, so there is no difference between 4.8.0 and 4.8.1 (to be released in 2-3 month).
    If a loop has just single exit and known constant number of iterations, as in the SPEC loop, we just don’t lower the upper bound estimate because of undefined behavior constraints, and in many cases we actually even warn (not in this case, there was a pass ordering issue that resulted for 4.8 to decision in between warning more often, but having false positives, or warning less often without false positives.
    Were you above trying some older GCC version (as in, before March 14th or so)? For 4.9.0 we’ll hopefully resolve the issues with false positives and make the warning warn even about the above loop.

  11. John, why does your update talk about 4.8.1? March 14 was more than a week ago, the new warning+breakage avoidance code *is* present in 4.8.0.

  12. Hi Alexander and Jakub, I was previously using a pre-4.8.0 compile of GCC, but I did build the actual 4.8.0 release yesterday in order to verify this behavior before writing the blog post. If I screwed that up, then I apologize and will add a big note about this at the top of the post. Double checking now…

  13. I must agree with the sentiment “The compiler isn’t breaking any code – the code is already broken.”

    It’s not the compiler’s job to try to guess what buggy code “is really trying to do” or what the programmer “really intended”, and somehow fix it. If the code has a bug (causing UB) then it has a bug. It’s not the compiler’s fault, it’s the programmer’s fault, and in this case the compiler is doing nothing that the language standard wouldn’t allow it to do.

    (Sure, a warning would be extremely nice and helpful, but the compiler isn’t required to issue one, especially because issuing such warnings in all possible cases is, AFAIK, provably impossible.)

  14. That’s stupid. Accessing the array at index 16 might be undefined behaviour, but still it should be very clear to the compiler that the loop is not endless. k is incremented and there is no memory access within the loop that might change that fact, not even with aliasing assumed, just read accesses to anything but a local integer.

  15. clang -fsanitize=address happens to not catch this, but -fsanitize=undefined catches it:

    $ clang -fsanitize=undefined undef_gcc48.c
    $ ./a.out
    undef_gcc48.c:6:28: runtime error: index 16 out of bounds for type ‘int [16]’

  16. BTW, if you want a testcase where -fno-aggressive-loop-optimizations changes generated code even in GCC 4.8.0, one testcase is e.g.
    int a[4];

    __attribute__((noinline, noclone)) int
    foo (int x)
    int i, r = 0, n = x & 31;
    for (i = 0; i < n; i++)
    r += a[i];
    return r;

    main ()
    int x = 255;
    __asm volatile ("" : "+r" (x));
    return foo (x);

    With -O3 (and without -fno-aggressive-loop-optimizations), GCC determines the upper bound of the loop to be 4 rather than 32 (or more, when not considering VRP) and so decides to completely unroll the loop (with comparisons/branch out before every read from a), so it actually won't do any uninitialized read beyond end of a, while with -O3 -fno-aggressive-loop-optimizations it is not allowed to take it into account and thus vectorizes the loop and will happily read beyond end of a array.

    As for the original testcase, gcc -fsanitize=address -fno-common detects it, but when the d array is common. See http://gcc.gnu.org/PR55739
    on why common symbols aren't instrumented by AddressSanitizer right now.

  17. Oh, BTW, in 464.h264ref d is an automatic array, not global one, so AddressSanitizer complains about it just fine even without -fno-common (so it would do if e.g. the array was initialized to something non-zero).

  18. Does UB allow altering the compile time of a program? I.e. if a function can be proven to be type 3, issue a warning (if enabled) and generate no output symbol (regardless of what warnings are enabled) in the object file.

  19. I suspect that exactly these kinds of optimizations trigger another problem that apparently went unnoticed:

    When I cross-compile linux 3.8.4 with gcc 4.8.0, my ARM machine boots the kernel and stalls completely after the message “booting the kernel”. The same kernel compiled with gcc 4.7.2 works just fine.

    I even tried to use “-fno-aggressive-loop-optimizations”, but to no avail. I do not have the time or means to investigate any further, but this sucks.

  20. Hi bcs, this could be done, I remember a version of LLVM (perhaps not any released version though) that was not even generating a return instruction for some type 3 functions. But really, if the compiler can actually notice this kind of thing happening, a nice error message would be nice.

    Hi Uwe, I agree, that sucks!

  21. Uwe, try enabling earlyprintk in your kernel config and command line. This might give you a better idea of where things are blowing up. What ARM hardware are you using? (Sorry for drifting off-topic.)

  22. My ARM machine is an Iomega iConnect (http://www.congenio.de/infos/iconnect.html).

    BTT: For the example given, I would expect the compiler to issue a warning for a non-terminating loop and not silently generate the code for it. Like, when you say:

    char c;

    if (c > 127) {


    you also get a warning that the condition will always yield false.

  23. @Mans: thanks for the tip, but it turns out that the kernel dies early during tty_unregister_device(), namely in klist_next(). It tries to deallocate space for the unused serial devices.

    It looks like the “fun case analysis” to me, where a pointer check is skipped because of optimizations, see here:


    However, the linux 3.8.5 code cannot even be compiled with “-O0” for a check, it breaks in mm/memory.c, so I am at an end there. I guess that this will not be the only place where gcc 4.8.0 “will break the code for you”. 😉

    Linux kernel maintainers will have a hard time getting that to work…

Comments are closed.