The Strict Aliasing Situation is Pretty Bad

I’ll start with a quick review of the strict aliasing rules in C and C++ and then present some less well-known material.

Strict Aliasing

Compiler optimizations are often shot down by the potential for pointers to be aliases. For example, although we might naively expect a compiler to optimize this function to return zero, that cannot happen because x and y might refer to the same location:

int foo(int *x, int *y) {
  *x = 0;
  *y = 1;
  return *x;
}

Generated code typically looks like this:

foo:    movl    $0, (%rdi)
        movl    $1, (%rsi)
        movl    (%rdi), %eax
        ret

Failure to optimize is particularly frustrating when we know for sure that x and y are not aliases. The “restrict” keyword in C was introduced to help solve this sort of problem but we’re not going to talk about that today. Rather, we’re going to talk about an orthogonal solution, the aliasing rules in C and C++ that permit the compiler to assume that an object will not be aliased by a pointer to a different type. Often this is called “strict aliasing” although that term does not appear in the standards. Consider, for example, this variant of the program above:

int foo(int *x, long *y) {
  *x = 0;
  *y = 1;
  return *x;
}

Since a pointer-to-int and a pointer-to-long may be assumed to not alias each other, the function can be compiled to return zero:

foo2:   movl    $0, (%rdi)
        xorl    %eax, %eax
        movq    $1, (%rsi)
        ret

As we see here, the aliasing rules give C and C++ compilers leverage that they can use to generate better code. On the other hand, since C is a low-level language and C++ can be used as a low-level language, both permit casting between pointer types, which can end up creating aliases that violate the compiler’s assumptions. For example, we might naively write code like this to access the representation of a floating point value:

unsigned long bogus_conversion(double d) {
  unsigned long *lp = (unsigned long *)&d;
  return *lp;
}

This function is undefined under the aliasing rules and while it happens to be compiled into the same code that would be emitted without the strict aliasing rules, it is easy to write incorrect code that looks like it is getting broken by the optimizer:

#include <stdio.h>

long foo(int *x, long *y) {
  *x = 0;
  *y = 1;
  return *x;
}

int main(void) {
  long l;
  printf("%ld\n", foo((int *)&l, &l));
}

$ gcc-5 strict.c ; ./a.out
1
$ gcc-5 -O2 strict.c ; ./a.out
0
$ clang strict.c ; ./a.out
1
$ clang -O2 strict.c ; ./a.out
0

An exception to the strict aliasing rules is made for pointers to character types, so it is always OK to inspect an object’s representation via an array of chars. This is necessary to make memcpy-like functions work properly.

So far, this is very well known. Now let’s look at a few consequences of strict aliasing that are perhaps not as widely known.

Physical Subtyping is Broken

An old paper that I like uses the term “physical subtyping” to refer to the struct-based implementation of inheritance in C. Searching for “object oriented C” returns quite a few links. Additionally, many large C systems (the Linux kernel for example) implement OO-like idioms. Any time this kind of code casts between pointer types and dereferences the resulting pointers, it violates the aliasing rules. Many aliasing rule violations can be found in this book about object oriented C. Some build systems, such as Linux’s, invoke GCC with its -fno-strict-aliasing flag to avoid problems.

Update based on some comments from Josh Haberman and Sam Tobin-Hochstadt: It looks like the specific case where the struct representing the derived type includes its parent as its first member should not trigger UB. The language in this part of the standard is very hard to parse out.

This program from the Cerberus project illustrates the problem with changing the type of a pointer to struct:

#include <stdio.h>

typedef struct { int i1; } s1;
typedef struct { int i2; } s2;

void f(s1 *s1p, s2 *s2p) {
  s1p->i1 = 2;
  s2p->i2 = 3;
  printf("%i\n", s1p->i1);
}

int main() {
  s1 s = {.i1 = 1};
  f(&s, (s2 *)&s);
}

$ gcc-5 effective.c ; ./a.out
3
$ gcc-5 -O2 effective.c ; ./a.out
2
$ clang-3.8 effective.c ; ./a.out
3
$ clang-3.8 -O2 effective.c ; ./a.out
3

Chunking Optimizations Are Broken

Code that processes bytes one at a time tends to be slow. While an optimizing compiler can sometimes make a naive character-processing loop much faster, in practice we often need to help the compiler out by explicitly processing word-sized chunks of bytes at a time. Since the data reinterpretation is generally done by casting to a non-character-typed pointer, the resulting accesses are undefined. Search the web for “fast memcpy” or “fast memset”: many of the hits will return erroneous code. Example 1, example 2, example 3. Although I have no evidence that it is being miscompiled, OpenSSL’s AES implementation uses chunking and is undefined.

One way to get chunking optimizations without UB is to use GCC’s may_alias attribute, as seen here in Musl. This isn’t supported even by Clang, as far as I know.

Offset Overlap is Bad

Here is a devilish little program by Richard Biener and Robbert Krebbers that I found via the Cerberus report:

#include <stdio.h>
#include <stdlib.h>

struct X {
  int i;
  int j;
};

int foo(struct X *p, struct X *q) {
  q->j = 1;
  p->i = 0;
  return q->j;
}

int main() {
  unsigned char *p = malloc(3 * sizeof(int));
  printf("%i\n", foo((struct X *)(p + sizeof(int)),
                     (struct X *)p));
}

It is ill-formed according to LLVM and GCC:

$ clang-3.8 krebbers.c ; ./a.out
0
$ clang-3.8 -O2 krebbers.c ; ./a.out
1
$ gcc-5 krebbers.c ; ./a.out
0
$ gcc-5 -O2 krebbers.c ; ./a.out
1

int8_t and uint8_t Are Not Necessarily Character Types

This bug (and some linked discussions) indicate that compiler developers don’t necessarily consider int8_t and uint8_t to be character types for aliasing purposes. Wholesale replacement of character types with standard integer types — as advocated here, for example — would almost certainly lead to interesting strict aliasing violations when the resulting code was run through a compiler that doesn’t think int8_t and uint8_t are character types. Happily, no compiler has done this yet (that I know of).

Summary

A lot of C code is broken under strict aliasing. Separate compilation is probably what protects us from broader compiler exploitation of the brokenness, but it is a very poor kind of protection. Static and dynamic checking tools are needed. If I were writing correctness-oriented C that relied on these casts I wouldn’t even consider building it without -fno-strict-aliasing.

Pascal Cuoq provided feedback on a draft of this piece.

32 thoughts on “The Strict Aliasing Situation is Pretty Bad”

  1. Nice writeup!

    One thing that works well in practice in cases where you really don’t want to process single bytes at a time is to use a union for going from one type to another.
    Granted, writing one member of a union and reading from a different one also isn’t allowed per the standard, but every compiler I’ve come across allows it. It’s useful in cases where you need to work with a type-punned pointer.

  2. The section on physical subtyping doesn’t quite make clear if the standard approach to subtyping in C, where structs have their parent as their first field, is also broken. The code snippet there has unrelated struct types, but now I’m worried about related ones. Of course, disallowing that would break every serious C program I’ve ever seen, in fundamental ways, but that hasn’t stopped anyone yet.

  3. It is possible to violate the strict aliasing rule even without casting pointers, by taking pointers to different members of a union.

  4. > … compiler developers don’t necessarily consider int8_t and uint8_t to be character types for aliasing purposes

    It makes sense. What about compilers for systems where CHAR_BIT is not 8?

    Regarding the rest of the post, thank you; this is eye-opening. I’m particularly concerned with where compilers are going with subtyping and optimizations. It’s almost as if the compiler folks and language designers want C to have immutable types. (I see a new type qualifier in the future.)

  5. I have yet to find a way
    to create an “opaque shell type” suitable for static allocation
    (hence, not incomplete)

    which would then be used as argument to functions using the real private type
    which size and alignment are <= to opaque shell type.

    It works, but it violates strict aliasing.
    And the problem is, there is simply no other way…

  6. tried to recompile C code that I Wrote in 1996 at university. (a monte carlo simulation of multi agents).
    in 2013 it was still compiling. And now it is not.

    I have written a drivers for linux that worked in 2000, python C extension 3 years ago for fun. And I always wrote some C code once in a while, because C was fun.

    C code used to do exactly what I was thinking it was doing. I also tried to rewrite code 2 months ago, it was a major pain. As a former physicist I limit myself to $CC [-O2] code.c
    I have written enough code that worked to have been pretty confident in my code.

    It is still the same syntax, but it is not the same language anymore.
    A lot of new sybilin limitations and flags that make no sense appeared in the compilerS.

    If it has the same syntax, but not the same behaviour. I would call modern C appellation of what llvm and gcc implements a lie. I am not confident anymore, not of my code, but of the output of the compilers.

    It is ++C– for me. Like the notation implies the new stuffs have some hidden side effects and incorrect assumptions that don’t make it idempotent to my plain old C.

    And stackoverflow proposed me no sets of flags to set recent C compilers to the old behaviour state.

    I think the geniuses behind the new compiler theory have lost track in their delirium of the need of users. They seems to kind of force a non backward compatible C behaviour.

    C users want a boring C compiler and don’t want any kind of bullshit crap à la Java to prevent poor coders to make mistakes and C compilers to map the code in anything other than expected in order to change poorly performing C by construction in extremely fast C.

    llvm and gcc are painfully changing C in a new beast I hate.

  7. I’m not sure about C, but C++ specifically supports converting a pointer to a standard-layout struct to a pointer to its initial member and vice versa in order to support structural subtyping. There are also related special cases around the common initial sequence of standard-layout structs inside a union.

  8. Could someone plz provide an explanation why the “long l” piece of code above produces different results based on the optimization level?..

  9. If x and y don’t alias, the stores to x and y can be executed in any order (this is useful because it gives the compiler more freedom to reorder or schedule code):

    int foo(int *x, long *y) {
    *x = 0;
    *y = 1;
    return *x;
    }

  10. Dave5: The function foo() receives two pointers to the same memory location.

    Without optimization is does as the code says: write 0, write 1, read back (1) and return 1.

    With optimization, it assumes that x and y point to different memory locations because they have different types. Because it just wrote 0 to *x, it skips reading back from *x and just returns 0.

  11. You are 100% correct that separate compilation is saving us from a lot of bugs. I have personally seen code break due to aliasing issues when cross-module inlining was enabled.

    My thought is that we should throw out the aliasing rules and take a subset; add a keyword that is the inverse of restrict, and say only that arguments to the same function with different types implicitly have the restrict keyword. I predict that this will yield a significant fraction of the fortran-like performance optimizations that the aliasing rules currently allow, while fixing a lot of breakage that already happens.

  12. > Separate compilation is probably what protects us from broader compiler exploitation of the brokenness, but it is a very poor kind of protection.

    I’ve been compiling the userland on my laptop using LTO and the highest level of optimization trying to find cases where it exposes bugs, and I’ve only encountered one visible in autofs so far. It’s far from exhaustive, but so far there doesn’t seem to be UB in a lot of software, or at very least UB the compiler doesn’t exploit.

  13. > Granted, writing one member of a union and reading from a different one also isn’t allowed per the standard, but every compiler I’ve come across allows it.

    Why do you think the standard forbids it? The C committee apparently disagrees; their response to DR 283 added a footnote clarifying that the standard allowed this behavior. IIRC, the older C90 standard had labeled the behavior “implementation-defined,” because it was supposed to be defined by the data representation.

  14. I still remember the surprise I felt when I realized that (at least in C++) ‘signed char’, ‘unsigned char’ and ‘char’ are all different type, can have templates specialized for each, overloaded functions etc… And of course the only type that is considered to may-alias any other type is ‘char’.

  15. Chunking is still possible to implement without aliasing violations, provided the compiler is good at optimizing regular memcpy() calls. (So this example is sort of moot, but it illustrates the point.)

    void *chunking_memcpy(void *dest, const void *src, size_t n) {
    char *cdest = dest;
    const char *csrc = src;
    const char *cend = csrc + n;
    unsigned long chunk;

    while (csrc + sizeof(chunk) <= cend) {
    memcpy(&chunk, csrc, sizeof(chunk));
    memcpy(cdest, &chunk, sizeof(chunk));
    cdest += sizeof(chunk);
    csrc += sizeof(chunk);
    }
    while (csrc < cend) {
    *cdest++ = *csrc++;
    }
    return dest;
    }

    With gcc 5.3, the inner loop becomes

    .L3:
    movq -8(%rcx), %r9
    addq $8, %rcx
    addq $8, %rsi
    movq %r9, -8(%rsi)
    cmpq %rcx, %r8
    jnb .L3

    (Then it does some crazy stuff to unroll the tail fixup loop, but still.)

  16. @Ryan Prichard:
    I may be wrong, but the C standard says:
    “6.5.2.3 Structure and union members
    95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.”

    To me, the last sentence (“This might be a trap representation.”) means that this might very well blow up in my face. Even in the newest version of the standard it still is unspecified behaviour, right?

    The C++11 standard says the following:
    “9.5 Unions [class.union]
    In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.”

    Digging a bit deeper into the standard doesn’t really clear things up for me.

    See this thread for more info:
    http://stackoverflow.com/questions/11373203/accessing-inactive-union-member-undefined-behavior

  17. Tavian, you’re right, I should have mentioned memcpy() in the post, it’s a very clear way to implement chunking. The only problem is that it may generate non-optimized builds that perform quite poorly.

    Stefan and Ryan: I wanted to steer clear of unions in this post, but basically they are a reliable way to do type-punning in modern C and C++. The standards really are a mess, but the compiler implementors have done the right thing here.

  18. Jason, that’s great to hear that things aren’t breaking! Even so, I’m somewhat worried that we’ll be seeing a slow series of time bombs going off as people tweak and tune LTO to increase its reach.

  19. Obviously nobody should highly optimize code unless they know it UB sane. My laptop’s a space monkey.

    I agree, there are some tricky cases where strict aliasing can bite. The attribute approach definitely seems like reasonable one. Static analyzers and instrumentation are also definitely necessary, particularly if you optimize across translation units.

  20. @regehr Yep. And because of course nobody can keep all the undefined behaviours straight at once, the loop’s test should be “cend – csrc >= sizeof(chunk)” instead of “csrc + sizeof(chunk) <= cend" to avoid making a pointer point out of bounds.

  21. As far as I can tell, Clang does handle `may_alias`: https://github.com/llvm-mirror/clang/blob/ba9f6803ce54c7d960732400b3fd8f364788c33b/include/clang/Basic/Attr.td#L860-L864

    >I’ve been compiling the userland on my laptop using LTO and the highest level
    > of optimization trying to find cases where it exposes bugs, and I’ve only encountered
    > one visible in autofs so far. It’s far from exhaustive, but so far there doesn’t seem to be
    > UB in a lot of software, or at very least UB the compiler doesn’t exploit.

    I would expect this to change if you apply LTO across libc as well.

    > Chunking is still possible to implement without aliasing violations, provided the
    > compiler is good at optimizing regular memcpy() calls.

    Nice trick! Though this doesn’t help you if the chunking function you’re implementing *is* `memcpy`.

  22. >> Chunking is still possible to implement without aliasing violations, provided the
    >> compiler is good at optimizing regular memcpy() calls.

    > Nice trick! Though this doesn’t help you if the chunking >function you’re implementing *is* `memcpy`.

    This crystallizes the situation, completely. C is originally a self-hosting system programming language, but has been turned into something that can no longer self-host. That says pretty clearly to me, You Compiler Folks Are Doing It Wrong.

    Bytes are bytes, and I write statements in the order they are intended to execute. Whenever you break that ‘as if’ constraint, your compiler is broken. Correctness must always come first. None of this shit would have flown back when I was a gcc2 maintainer. (Having written a link time optimizer for M68K, I know pretty well what works and what doesn’t.)

    Seems to me that gcc’s downhill progress correlates with the growth of x86. The natural outcome from compiler writers desperate to wring performance out of an intractably non-designed instruction set. Instruction reordering wasn’t even an issue on SPARC, MIPS, i860, etc. It’s only due to x86’s byzantine architecture that most of this was ever needed.

  23. Sorry but that way things are going in C, if you think correctness is more important than speed, you better not use C or C++ if you can.

  24. > Devon H. O’Dell says:
    > March 15, 2016 at 7:47 am
    > It makes sense. What about compilers for systems where CHAR_BIT is not 8?

    AFAIK `int8_t` cannot exist on such a system, since `char` is the smallest type has at least 8 bits.

  25. > I’m confused by the memcpy() notes there… Is this not safe anymore:
    >
    > int64_t foo;
    > memcpy(&foo, “12345678”, 8);

    No, I believe this is safe. I think what the LLVM notes are claiming is something similar to what ARM CC has done for some time:

    http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/4972.html

    Memcpy can move between things of different alignments, but it is allowed to make assumptions about the alignment of the source and destination based on their types. For example, in your code the compiler is allowed to assume `&foo` is 8-byte aligned. Having said that, I’m having a hard time parsing the odd char array example the LLVM docs use.

  26. Jason ” add a keyword that is the inverse of restrict”
    As I understand it, the C++ folks have gone the other way ???? and have added a “reference” type, which is sort of ??? equivilant to a FORTRAN or Pascal or BASIC var type. That allowed them to have FORTRAN / Pascal /BASIC levels of optimisation, at a time when they hadn’t yet broken aliasing???

    Forgive my ignorance.

    It seems to me that people like language extensions better than they like language restrictions. Perhaps we should be lobbying for an extension to C that would allow optimisable pointers, like Var or Reference, so that pointers could be aliased the way programmers intend, and compiler writers could achieve optimisations.

  27. >> I’m confused by the memcpy() notes there… Is this not safe anymore:
    >>
    >> int64_t foo;
    >> memcpy(&foo, “12345678”, 8);

    >No, I believe this is safe.

    I don’t see how that is safe. Is there anything in the language standard that says sizeof(int64_t) is at least eight? Aren’t implementations allowed to use 16-bits, or more, for char?

  28. Oh, the good old aliasing 🙂 I keep maintaining a memory manager for my SAT solver, and I had to dig into aliasing for the reasons explained. It’s fun. For highly-optimized math libraries (thinking of Gaussian Elimination here) the judicious use of the “restrict” keyword can make huge differences, I’ve heard. Martin Albrecht could probably do a nice writeup on that. See https://bitbucket.org/malb/m4ri/ — arith library for dense matrices over F2

Comments are closed.