The Strict Aliasing Situation is Pretty Bad

I’ll start with a quick review of the strict aliasing rules in C and C++ and then present some less well-known material.

Strict Aliasing

Compiler optimizations are often shot down by the potential for pointers to be aliases. For example, although we might naively expect a compiler to optimize this function to return zero, that cannot happen because x and y might refer to the same location:

int foo(int *x, int *y) {
  *x = 0;
  *y = 1;
  return *x;
}

Generated code typically looks like this:

foo:    movl    $0, (%rdi)
        movl    $1, (%rsi)
        movl    (%rdi), %eax
        ret

Failure to optimize is particularly frustrating when we know for sure that x and y are not aliases. The “restrict” keyword in C was introduced to help solve this sort of problem but we’re not going to talk about that today. Rather, we’re going to talk about an orthogonal solution, the aliasing rules in C and C++ that permit the compiler to assume that an object will not be aliased by a pointer to a different type. Often this is called “strict aliasing” although that term does not appear in the standards. Consider, for example, this variant of the program above:

int foo(int *x, long *y) {
  *x = 0;
  *y = 1;
  return *x;
}

Since a pointer-to-int and a pointer-to-long may be assumed to not alias each other, the function can be compiled to return zero:

foo2:   movl    $0, (%rdi)
        xorl    %eax, %eax
        movq    $1, (%rsi)
        ret

As we see here, the aliasing rules give C and C++ compilers leverage that they can use to generate better code. On the other hand, since C is a low-level language and C++ can be used as a low-level language, both permit casting between pointer types, which can end up creating aliases that violate the compiler’s assumptions. For example, we might naively write code like this to access the representation of a floating point value:

unsigned long bogus_conversion(double d) {
  unsigned long *lp = (unsigned long *)&d;
  return *lp;
}

This function is undefined under the aliasing rules and while it happens to be compiled into the same code that would be emitted without the strict aliasing rules, it is easy to write incorrect code that looks like it is getting broken by the optimizer:

#include <stdio.h>

long foo(int *x, long *y) {
  *x = 0;
  *y = 1;
  return *x;
}

int main(void) {
  long l;
  printf("%ld\n", foo((int *)&l, &l));
}

$ gcc-5 strict.c ; ./a.out
1
$ gcc-5 -O2 strict.c ; ./a.out
0
$ clang strict.c ; ./a.out
1
$ clang -O2 strict.c ; ./a.out
0

An exception to the strict aliasing rules is made for pointers to character types, so it is always OK to inspect an object’s representation via an array of chars. This is necessary to make memcpy-like functions work properly.

So far, this is very well known. Now let’s look at a few consequences of strict aliasing that are perhaps not as widely known.

Physical Subtyping is Broken

An old paper that I like uses the term “physical subtyping” to refer to the struct-based implementation of inheritance in C. Searching for “object oriented C” returns quite a few links. Additionally, many large C systems (the Linux kernel for example) implement OO-like idioms. Any time this kind of code casts between pointer types and dereferences the resulting pointers, it violates the aliasing rules. Many aliasing rule violations can be found in this book about object oriented C. Some build systems, such as Linux’s, invoke GCC with its -fno-strict-aliasing flag to avoid problems.

Update based on some comments from Josh Haberman and Sam Tobin-Hochstadt: It looks like the specific case where the struct representing the derived type includes its parent as its first member should not trigger UB. The language in this part of the standard is very hard to parse out.

This program from the Cerberus project illustrates the problem with changing the type of a pointer to struct:

#include <stdio.h>

typedef struct { int i1; } s1;
typedef struct { int i2; } s2;

void f(s1 *s1p, s2 *s2p) {
  s1p->i1 = 2;
  s2p->i2 = 3;
  printf("%i\n", s1p->i1);
}

int main() {
  s1 s = {.i1 = 1};
  f(&s, (s2 *)&s);
}

$ gcc-5 effective.c ; ./a.out
3
$ gcc-5 -O2 effective.c ; ./a.out
2
$ clang-3.8 effective.c ; ./a.out
3
$ clang-3.8 -O2 effective.c ; ./a.out
3

Chunking Optimizations Are Broken

Code that processes bytes one at a time tends to be slow. While an optimizing compiler can sometimes make a naive character-processing loop much faster, in practice we often need to help the compiler out by explicitly processing word-sized chunks of bytes at a time. Since the data reinterpretation is generally done by casting to a non-character-typed pointer, the resulting accesses are undefined. Search the web for “fast memcpy” or “fast memset”: many of the hits will return erroneous code. Example 1, example 2, example 3. Although I have no evidence that it is being miscompiled, OpenSSL’s AES implementation uses chunking and is undefined.

One way to get chunking optimizations without UB is to use GCC’s may_alias attribute, as seen here in Musl. This isn’t supported even by Clang, as far as I know.

Offset Overlap is Bad

Here is a devilish little program by Richard Biener and Robbert Krebbers that I found via the Cerberus report:

#include <stdio.h>
#include <stdlib.h>

struct X {
  int i;
  int j;
};

int foo(struct X *p, struct X *q) {
  q->j = 1;
  p->i = 0;
  return q->j;
}

int main() {
  unsigned char *p = malloc(3 * sizeof(int));
  printf("%i\n", foo((struct X *)(p + sizeof(int)),
                     (struct X *)p));
}

It is ill-formed according to LLVM and GCC:

$ clang-3.8 krebbers.c ; ./a.out
0
$ clang-3.8 -O2 krebbers.c ; ./a.out
1
$ gcc-5 krebbers.c ; ./a.out
0
$ gcc-5 -O2 krebbers.c ; ./a.out
1

int8_t and uint8_t Are Not Necessarily Character Types

This bug (and some linked discussions) indicate that compiler developers don’t necessarily consider int8_t and uint8_t to be character types for aliasing purposes. Wholesale replacement of character types with standard integer types — as advocated here, for example — would almost certainly lead to interesting strict aliasing violations when the resulting code was run through a compiler that doesn’t think int8_t and uint8_t are character types. Happily, no compiler has done this yet (that I know of).

Summary

A lot of C code is broken under strict aliasing. Separate compilation is probably what protects us from broader compiler exploitation of the brokenness, but it is a very poor kind of protection. Static and dynamic checking tools are needed. If I were writing correctness-oriented C that relied on these casts I wouldn’t even consider building it without -fno-strict-aliasing.

Pascal Cuoq provided feedback on a draft of this piece.

March 15, 2016

regehr

Compilers, Computer Science, Software Correctness

32 responses to “The Strict Aliasing Situation is Pretty Bad”

Stefan Reinalter says:

March 15, 2016 at 6:12 am

Nice writeup!

One thing that works well in practice in cases where you really don’t want to process single bytes at a time is to use a union for going from one type to another.
Granted, writing one member of a union and reading from a different one also isn’t allowed per the standard, but every compiler I’ve come across allows it. It’s useful in cases where you need to work with a type-punned pointer.
Sam Tobin-Hochstadt says:

March 15, 2016 at 6:21 am

The section on physical subtyping doesn’t quite make clear if the standard approach to subtyping in C, where structs have their parent as their first field, is also broken. The code snippet there has unrelated struct types, but now I’m worried about related ones. Of course, disallowing that would break every serious C program I’ve ever seen, in fundamental ways, but that hasn’t stopped anyone yet.
Tor Klingberg says:

March 15, 2016 at 7:07 am

It is possible to violate the strict aliasing rule even without casting pointers, by taking pointers to different members of a union.
regehr says:

March 15, 2016 at 7:31 am

Sam, see here:
http://stackoverflow.com/questions/19010971/compatible-types-vs-strict-aliasing-rules
Devon H. O’Dell says:

March 15, 2016 at 7:47 am

> … compiler developers donâ€™t necessarily consider int8_t and uint8_t to be character types for aliasing purposes

It makes sense. What about compilers for systems where CHAR_BIT is not 8?

Regarding the rest of the post, thank you; this is eye-opening. I’m particularly concerned with where compilers are going with subtyping and optimizations. It’s almost as if the compiler folks and language designers want C to have immutable types. (I see a new type qualifier in the future.)
embeddedApp says:

March 15, 2016 at 7:52 am

I have yet to find a way
to create an “opaque shell type” suitable for static allocation
(hence, not incomplete)

which would then be used as argument to functions using the real private type
which size and alignment are <= to opaque shell type.

It works, but it violates strict aliasing.
And the problem is, there is simply no other way…
julien tayon says:

March 15, 2016 at 8:38 am

tried to recompile C code that I Wrote in 1996 at university. (a monte carlo simulation of multi agents).
in 2013 it was still compiling. And now it is not.

I have written a drivers for linux that worked in 2000, python C extension 3 years ago for fun. And I always wrote some C code once in a while, because C was fun.

C code used to do exactly what I was thinking it was doing. I also tried to rewrite code 2 months ago, it was a major pain. As a former physicist I limit myself to $CC [-O2] code.c
I have written enough code that worked to have been pretty confident in my code.

It is still the same syntax, but it is not the same language anymore.
A lot of new sybilin limitations and flags that make no sense appeared in the compilerS.

If it has the same syntax, but not the same behaviour. I would call modern C appellation of what llvm and gcc implements a lie. I am not confident anymore, not of my code, but of the output of the compilers.

It is ++C– for me. Like the notation implies the new stuffs have some hidden side effects and incorrect assumptions that don’t make it idempotent to my plain old C.

And stackoverflow proposed me no sets of flags to set recent C compilers to the old behaviour state.

I think the geniuses behind the new compiler theory have lost track in their delirium of the need of users. They seems to kind of force a non backward compatible C behaviour.

C users want a boring C compiler and don’t want any kind of bullshit crap Ã la Java to prevent poor coders to make mistakes and C compilers to map the code in anything other than expected in order to change poorly performing C by construction in extremely fast C.

llvm and gcc are painfully changing C in a new beast I hate.
Shahms King says:

March 15, 2016 at 10:19 am

I’m not sure about C, but C++ specifically supports converting a pointer to a standard-layout struct to a pointer to its initial member and vice versa in order to support structural subtyping. There are also related special cases around the common initial sequence of standard-layout structs inside a union.
Dave5 says:

March 15, 2016 at 10:21 am

Could someone plz provide an explanation why the “long l” piece of code above produces different results based on the optimization level?..
Tim says:

March 15, 2016 at 10:33 am

If x and y don’t alias, the stores to x and y can be executed in any order (this is useful because it gives the compiler more freedom to reorder or schedule code):

int foo(int *x, long *y) {
*x = 0;
*y = 1;
return *x;
}
Tor Klingberg says:

March 15, 2016 at 10:34 am

Dave5: The function foo() receives two pointers to the same memory location.

Without optimization is does as the code says: write 0, write 1, read back (1) and return 1.

With optimization, it assumes that x and y point to different memory locations because they have different types. Because it just wrote 0 to *x, it skips reading back from *x and just returns 0.
Jason says:

March 15, 2016 at 1:05 pm

You are 100% correct that separate compilation is saving us from a lot of bugs. I have personally seen code break due to aliasing issues when cross-module inlining was enabled.

My thought is that we should throw out the aliasing rules and take a subset; add a keyword that is the inverse of restrict, and say only that arguments to the same function with different types implicitly have the restrict keyword. I predict that this will yield a significant fraction of the fortran-like performance optimizations that the aliasing rules currently allow, while fixing a lot of breakage that already happens.
Jason says:

March 15, 2016 at 1:55 pm

> Separate compilation is probably what protects us from broader compiler exploitation of the brokenness, but it is a very poor kind of protection.

I’ve been compiling the userland on my laptop using LTO and the highest level of optimization trying to find cases where it exposes bugs, and I’ve only encountered one visible in autofs so far. It’s far from exhaustive, but so far there doesn’t seem to be UB in a lot of software, or at very least UB the compiler doesn’t exploit.
Ryan Prichard says:

March 15, 2016 at 2:14 pm

> Granted, writing one member of a union and reading from a different one also isnâ€™t allowed per the standard, but every compiler Iâ€™ve come across allows it.

Why do you think the standard forbids it? The C committee apparently disagrees; their response to DR 283 added a footnote clarifying that the standard allowed this behavior. IIRC, the older C90 standard had labeled the behavior “implementation-defined,” because it was supposed to be defined by the data representation.
Morten Ofstad says:

March 15, 2016 at 2:39 pm

I still remember the surprise I felt when I realized that (at least in C++) ‘signed char’, ‘unsigned char’ and ‘char’ are all different type, can have templates specialized for each, overloaded functions etc… And of course the only type that is considered to may-alias any other type is ‘char’.
Dave5 says:

March 15, 2016 at 3:01 pm

Thank you all for the answer. 🙂
Tavian Barnes says:

March 15, 2016 at 3:01 pm

Chunking is still possible to implement without aliasing violations, provided the compiler is good at optimizing regular memcpy() calls. (So this example is sort of moot, but it illustrates the point.)

void *chunking_memcpy(void *dest, const void *src, size_t n) {
char *cdest = dest;
const char *csrc = src;
const char *cend = csrc + n;
unsigned long chunk;

while (csrc + sizeof(chunk) <= cend) {
memcpy(&chunk, csrc, sizeof(chunk));
memcpy(cdest, &chunk, sizeof(chunk));
cdest += sizeof(chunk);
csrc += sizeof(chunk);
}
while (csrc < cend) {
*cdest++ = *csrc++;
}
return dest;
}

With gcc 5.3, the inner loop becomes

.L3:
movq -8(%rcx), %r9
addq $8, %rcx
addq $8, %rsi
movq %r9, -8(%rsi)
cmpq %rcx, %r8
jnb .L3

(Then it does some crazy stuff to unroll the tail fixup loop, but still.)
Stefan Reinalter says:

March 15, 2016 at 3:12 pm

@Ryan Prichard:
I may be wrong, but the C standard says:
“6.5.2.3 Structure and union members
95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called â€˜â€˜type punningâ€™â€™). This might be a trap representation.”

To me, the last sentence (“This might be a trap representation.”) means that this might very well blow up in my face. Even in the newest version of the standard it still is unspecified behaviour, right?

The C++11 standard says the following:
“9.5 Unions [class.union]
In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.”

Digging a bit deeper into the standard doesn’t really clear things up for me.

See this thread for more info:
http://stackoverflow.com/questions/11373203/accessing-inactive-union-member-undefined-behavior
regehr says:

March 15, 2016 at 3:59 pm

Tavian, you’re right, I should have mentioned memcpy() in the post, it’s a very clear way to implement chunking. The only problem is that it may generate non-optimized builds that perform quite poorly.

Stefan and Ryan: I wanted to steer clear of unions in this post, but basically they are a reliable way to do type-punning in modern C and C++. The standards really are a mess, but the compiler implementors have done the right thing here.
regehr says:

March 15, 2016 at 4:01 pm

Jason, that’s great to hear that things aren’t breaking! Even so, I’m somewhat worried that we’ll be seeing a slow series of time bombs going off as people tweak and tune LTO to increase its reach.
Jason says:

March 15, 2016 at 4:28 pm

Obviously nobody should highly optimize code unless they know it UB sane. My laptop’s a space monkey.

I agree, there are some tricky cases where strict aliasing can bite. The attribute approach definitely seems like reasonable one. Static analyzers and instrumentation are also definitely necessary, particularly if you optimize across translation units.
Tavian Barnes says:

March 15, 2016 at 6:39 pm

@regehr Yep. And because of course nobody can keep all the undefined behaviours straight at once, the loop’s test should be “cend – csrc >= sizeof(chunk)” instead of “csrc + sizeof(chunk) <= cend" to avoid making a pointer point out of bounds.
regehr says:

March 15, 2016 at 6:43 pm

Tavian I was just working on a different post that talks about avoiding pointers that are out of bounds!
Matthew Fernandez says:

March 15, 2016 at 10:13 pm

As far as I can tell, Clang does handle `may_alias`: https://github.com/llvm-mirror/clang/blob/ba9f6803ce54c7d960732400b3fd8f364788c33b/include/clang/Basic/Attr.td#L860-L864

>Iâ€™ve been compiling the userland on my laptop using LTO and the highest level
> of optimization trying to find cases where it exposes bugs, and Iâ€™ve only encountered
> one visible in autofs so far. Itâ€™s far from exhaustive, but so far there doesnâ€™t seem to be
> UB in a lot of software, or at very least UB the compiler doesnâ€™t exploit.

I would expect this to change if you apply LTO across libc as well.

> Chunking is still possible to implement without aliasing violations, provided the
> compiler is good at optimizing regular memcpy() calls.

Nice trick! Though this doesn’t help you if the chunking function you’re implementing *is* `memcpy`.
Howard Chu says:

March 16, 2016 at 1:31 am

>> Chunking is still possible to implement without aliasing violations, provided the
>> compiler is good at optimizing regular memcpy() calls.

> Nice trick! Though this doesnâ€™t help you if the chunking >function youâ€™re implementing *is* `memcpy`.

This crystallizes the situation, completely. C is originally a self-hosting system programming language, but has been turned into something that can no longer self-host. That says pretty clearly to me, You Compiler Folks Are Doing It Wrong.

Bytes are bytes, and I write statements in the order they are intended to execute. Whenever you break that ‘as if’ constraint, your compiler is broken. Correctness must always come first. None of this shit would have flown back when I was a gcc2 maintainer. (Having written a link time optimizer for M68K, I know pretty well what works and what doesn’t.)

Seems to me that gcc’s downhill progress correlates with the growth of x86. The natural outcome from compiler writers desperate to wring performance out of an intractably non-designed instruction set. Instruction reordering wasn’t even an issue on SPARC, MIPS, i860, etc. It’s only due to x86’s byzantine architecture that most of this was ever needed.
Luiz Monad says:

March 16, 2016 at 8:11 am

Sorry but that way things are going in C, if you think correctness is more important than speed, you better not use C or C++ if you can.
Ordo says:

March 18, 2016 at 11:37 am

> Devon H. O’Dell says:
> March 15, 2016 at 7:47 am
> It makes sense. What about compilers for systems where CHAR_BIT is not 8?

AFAIK `int8_t` cannot exist on such a system, since `char` is the smallest type has at least 8 bits.
Sean says:

March 19, 2016 at 8:14 pm

The clang 3.8 release notes say they are even more aggressive now with alignment optimisations:

http://llvm.org/releases/3.8.0/tools/clang/docs/ReleaseNotes.html#alignment

I’m confused by the memcpy() notes there… Is this not safe anymore:

int64_t foo;
memcpy(&foo, “12345678”, 8);

I thought memcpy was the safe way to move things between different alignments.
Matthew Fernandez says:

March 20, 2016 at 2:28 pm

> Iâ€™m confused by the memcpy() notes thereâ€¦ Is this not safe anymore:
>
> int64_t foo;
> memcpy(&foo, “12345678”, 8);

No, I believe this is safe. I think what the LLVM notes are claiming is something similar to what ARM CC has done for some time:

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/4972.html

Memcpy can move between things of different alignments, but it is allowed to make assumptions about the alignment of the source and destination based on their types. For example, in your code the compiler is allowed to assume `&foo` is 8-byte aligned. Having said that, I’m having a hard time parsing the odd char array example the LLVM docs use.
David G says:

March 20, 2016 at 7:52 pm

Jason ” add a keyword that is the inverse of restrict”
As I understand it, the C++ folks have gone the other way ???? and have added a “reference” type, which is sort of ??? equivilant to a FORTRAN or Pascal or BASIC var type. That allowed them to have FORTRAN / Pascal /BASIC levels of optimisation, at a time when they hadn’t yet broken aliasing???

Forgive my ignorance.

It seems to me that people like language extensions better than they like language restrictions. Perhaps we should be lobbying for an extension to C that would allow optimisable pointers, like Var or Reference, so that pointers could be aliased the way programmers intend, and compiler writers could achieve optimisations.
Bill Wade says:

March 22, 2016 at 8:46 am

>> Iâ€™m confused by the memcpy() notes thereâ€¦ Is this not safe anymore:
>>
>> int64_t foo;
>> memcpy(&foo, “12345678”, 8);

>No, I believe this is safe.

I don’t see how that is safe. Is there anything in the language standard that says sizeof(int64_t) is at least eight? Aren’t implementations allowed to use 16-bits, or more, for char?
Mate Soos says:

March 25, 2016 at 7:27 pm

Oh, the good old aliasing 🙂 I keep maintaining a memory manager for my SAT solver, and I had to dig into aliasing for the reasons explained. It’s fun. For highly-optimized math libraries (thinking of Gaussian Elimination here) the judicious use of the “restrict” keyword can make huge differences, I’ve heard. Martin Albrecht could probably do a nice writeup on that. See https://bitbucket.org/malb/m4ri/ — arith library for dense matrices over F2