UB Canaries

If you report an undefined behavior bug, a common reaction from software developers is “So what? Our code works just fine.” As a random example, here is a discussion I had with Rasmus Lerdorf about five years ago about some UBs in the PHP interpreter. One might point out that it wasn’t a very mature exchange but I wasn’t even 40 yet at the time. (Earlier I had an example here from OpenSSL but this one is more suitable.)

An idea I’ve kicked around for a long time, but only got around to implementing this week, is a collection of canaries for undefined behavior: little test programs that automate the process of determining whether a given compiler configuration is willing to exploit particular UBs. Here’s the UB-canary repo. The idea behind canaries is that since they’re easy to run, they provide a quick and easy way to figure out which UBs the users of a particular compiler should be especially worried about. The definition of “exploit” requires a bit of care: exploitation only makes sense in terms of an expectation held by a C/C++ programmer. For example, I might expect that when a negative number is shifted to the left, the result is the same as if the number had been cast to unsigned before being left-shifted. Of course we need to be completely clear with ourselves that any such expectation has nothing to do with the language standard and everything to do with what a particular compiler happens to do, either because the providers of that compiler are unwilling to exploit that UB or just because they have not gotten around to exploiting it yet. When no real guarantee from the compiler provider exists, we like to say that as-yet unexploited UBs are time bombs: they’re waiting to go off next month or next year when the compiler gets a bit more aggressive.

Below are some results from various versions of GCC and LLVM on Ubuntu Linux. Each compiler was tested at -O0, -O1, -O2, -Os, and -O3; the number in each part of the table indicates the number of optimization options at which the compiler was willing to exploit the UB. Most compilers were targeting x86 but for some of the older versions of GCC (4.0, 4.1, 4.2) I could no longer get that to work due to some sort of libc problem, so they were targeting x86-64. I doubt that affects the results much.

Here are the results for GCC:

4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9
addr-null 5 5 5 5 5 5 0 0 0 0
array-oob 0 0 0 0 0 0 0 0 0 0
dangling-pointer 4 4 4 4 4 4 4 4 4 4
int-min-mod-minus-1 5 5 5 5 5 5 5 5 5 5
memcpy-overlap 5 5 5 5 5 5 5 5 5 5
modify-string-literal 5 5 5 5 5 5 5 5 5 5
pointer-casts 4 4 4 4 4 0 0 0 0 0
shift-by-bitwidth 2 2 2 3 3 3 3 3 4 4
signed-integer-overflow 2 5 3 3 3 3 3 3 3 3
signed-left-shift 0 0 0 0 0 0 0 0 0 0
strict-aliasing 3 3 3 0 2 0 3 3 3 3
uninitialized-variable 0 0 0 1 1 1 1 1 1 1

And for LLVM:

2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6
addr-null 0 0 0 0 0 0 0 0
array-oob 0 0 0 0 0 0 0 0
dangling-pointer 3 3 3 3 3 3 3 3
int-min-mod-minus-1 5 5 5 5 5 5 5 5
memcpy-overlap 5 5 5 5 5 5 5 5
modify-string-literal 5 5 5 5 5 5 5 5
pointer-casts 5 5 5 5 5 5 5 5
shift-by-bitwidth 3 3 3 3 3 3 3 3
signed-integer-overflow 4 4 4 4 4 4 4 4
signed-left-shift 0 0 0 0 0 0 0 0
strict-aliasing 3 3 3 3 3 3 3 3
uninitialized-variable 4 4 4 4 4 4 4 4

Click on the links to see the corresponding assumptions. It is interesting that GCC shows a lot more variance in UB exploitation than LLVM; I suspect that this is simply because the GCC versions shown here span a longer period of time than the LLVM versions (~10 years vs. ~4 years). It is also interesting that UB exploitation is not getting (as one might have feared) uniformly worse. Although overlapping memcpy() calls were always exploited by the compilers tested here, that is not the case on my Mac. I believe there have been compilers that exploited the signed-left-shift UBs but either my canaries aren’t good enough to find them or else those behaviors did not happen to be seen in the releases I choose here. Modifying a string literal never works on a modern Linux machine but it continues to work on embedded platforms that lack memory protection (I just saw some code doing this the other day). The INT_MIN % -1 case is interesting since it has a sensible answer and also my reading of the C99 standard is that this construct is not undefined; it is explicitly undefined in the newer standards.

Please let me know if:

  • You notice a bug in any UB canary or in the (admittedly crappy) driver script.
  • You know of a compiler that exploits an UB but where I don’t have a canary detecting the exploitation. A pull request would be ideal.

Canary code comes from a variety of sources and I have tried to give credit in the individual files. UB canaries are related to several ongoing UB-protection efforts but I want to in particular mention the Cerberus C Survey that is being run by Peter Sewell’s research group at Cambridge. Their goal is to create a mechanized semantics for C as it is really used, as opposed to the somewhat fictitious language described in the standards. If you are a C/C++ developer who cares about UB, please take a few minutes to fill out the survey.

17 responses to “UB Canaries”

  1. 1. It is shameful that that bug report had to explain so much why UB is undesirable in OpenSSL.
    2. I think all those undefined behaviors from your canaries should be submitted as feature requests to both compilers. They should be detected as UB, warned about and replaced by unreachable().

  2. I noticed that your array-oob gives the impression that it’s safe to do so, since it doesn’t appear to fault any optimizers. However, it’s fairly easy to construct an example that causes a segmentation fault if you try to access 1 past the last element–just situate the array such that its last entry lies on a page boundary. A test that I did with a modified version of your code (sans pfx array) found that I could get this to occur with N=368.

    Interestingly enough, in the course of trying to find out the sweet spot, I did discover that gcc appears to have some weird variable ordering rules if you declare int a[N], pfx[N]; I suspect this may be due to tentative definitions, but I don’t recall the rules since I may have forgotten to recompile a few of the tests.

    IMHO, signed left shifts should be well-defined on the basis of their equivalent unsigned shift interpretation–shifts are generally useful as bitwise operations, and if you truly want to express equivalent arithmetic operators with resulting overflow semantics, any compiler worth its salt should be able to reduce multiplies and divides to shifts for you.

  3. For array-oob testcase you should test GCC with option -faggressive-loop-optimizations (since 4.8). The manual says it’s not enabled by default on any -O level.

  4. Hi Turo, I don’t want to turn on random flags since this experiment is about what the compilers do at the optimization levels that people most often use in practice.

    Joshua, do you want to submit that oob program?

    Regarding signed left shifts, I agree they should be well-defined in the same situations that unsigned left shifts are. And indeed the later standards do back off on the signed shift UBs a bit:

  5. As a dyed-in-the-wool Luddite, my take on this is that the whole problem is that *compilers change* in the first place. Me, I’d buy a compiler *once* and use it *forever,* and that’d be that. I’d get used to it, and I’d know its quirks, and all would be well. It’s *upgrading all the time* that gets you involved in this sort of stuff.

  6. You have canaries for signed arithmetic overlfow. You need similar canaries for pointer arithmetic. Many people think that they can detect pointer wrapping with something like “if (p + offset < p) …", but compilers will often optimize that away. In GCC, if offset is signed, the compiler will optimize the conditional to false, even with -fwrapv.

  7. Chris, embedded system developers do this to some extent (depending on what kind of system it is).

    Alas, for mainstream program development it’s often impossible to stay with an old compiler for very long.

  8. Clarification: I meant to say that given the code
    “if (p + offset = 0, it will optimize the conditional to false. Sorry for omitting the other condition.

  9. Regarding “random flags”, I agree the simple -O levels are the most valuable, but for any library that often gets built outside the control of the author, I think those random flags are a risk worth inspecting. (Particularly if the cost can be reduced to a few minutes of compute time and adding a line to a config file.)

  10. memcpy-overlap depends partly on the C library of the platform you run it on.

  11. Richard, yep. On my Mac one of the compilers (I can’t remember if it’s gcc or clang) only exploits overlapping memcpy() at -Os, not at any other option, I suppose because when trying to save code size it selects a different memcpy…