Checking Up on Dataflow Analyses

An important tool for reasoning about programs without running them is dataflow analysis, which can be used to compute facts such as:

  • an integer-valued variable is non-negative at a particular program point
  • a conditional branch always goes one way
  • an indirect write cannot modify a particular array

Dataflow facts drive optimizations such as constant propagation and dead code elimination. Outside of compilers, dataflow analyses are important in tools that prove safety properties, such as absence of undefined behaviors in C or C++ programs. Dataflow analysis is backed by a rich academic literature and tool writers have decades of experience implementing it. This set of lecture notes is my favorite introduction to dataflow analysis.

Despite all of the nice theorems, dataflow analyses are hard to get right, in part because people are just not very good at reasoning about properties that hold over sets of program executions. As a simple exercise in this kind of thinking, try this: you have a variable A whose value is known to be in the range [0..1000] and a variable B that is known to be in the range [10000..10050]. The program computes the bitwise xor of A and B and assigns the result into C. Provide tight bounds on the values that C can take. If you find this to be easy, that’s great — but consider that you then have to make this work for all intervals and after that there are about 9,999 similarly fiddly things left to implement and any mistakes you make are likely to result in miscompilations.

So now let’s finally get to the point of this post. The point is that since dataflow is hard, if you implement a dataflow analysis, you should also implement a dynamic checker that looks for cases where the analysis has come to the wrong conclusion. To keep going with the running example, the program being analyzed contains this line of code:

// A in [0..1000] and B in [10000..10050]
C = A ^ B;

If our interval analysis says that C has to be in the range [9225..12255], we would rewrite the code to add an assertion:

// A in [0..1000] and B in [10000..10050]
C = A ^ B;
assert(C >= 9225 && C <= 12255);

Now we can run the instrumented program and, if we’re very lucky, it will execute this code with values such as A = 830 and B = 10041, making C = 9223, triggering an assertion violation telling us that our analysis is unsound and giving us something to debug. The other way to debug this sort of problem is to backtrack from a miscompilation — an experience enjoyed by basically nobody.

Every dataflow analysis that I’ve implemented or supervised the implementation of has gotten a dynamic checker. This is a great testing technique that has numerous advantages. It finds bugs in practice. It can be automated, either using a program generator like Csmith or else using a regular test suite. The annotated programs serve as a useful way to look at the precision of an analysis: if not very many assertions show up, or if the bounds are very loose, then we probably need to do some tuning for precision. Finally, if we don’t find any bugs, then the dataflow analysis is probably (please excuse me, formal methods people) correct enough for practical purposes.

Of course, not everything that can be computed by a dataflow analysis can be expressed in code. For example, in C and C++ there’s no lightweight way to assert that a pointer refers to valid storage or that a variable is uninitialized. E-ACSL is a neat piece of related work in this general area.

Just recently I’ve been looking at the efficiency of integer overflow checking using LLVM and since then I wrote some code that uses integer range dataflow facts to remove overflow checks that can be proved to not fire. Here’s the under-review patch. The ranges are computed by an LLVM pass called LazyValueInfo (LVI), which better be correct if we’re going to rely on it, so I wrote a little LLVM pass that does the necessary checking. It processes each function in between the optimization and code generation phases by:

(If you build this pass, you’ll need a slightly hacked LLVM, see the README.) Although it might seem tempting to run the optimizers again on the instrumented code in order to clean it up a bit, this would be a very bad idea: the very dataflow analyses that we’re trying to check up on would be used to drive optimizations, which could easily end up incorrectly deleting our checks.

So far, some testing using Csmith hasn’t turned up any bugs in LVI, which is great. Less great is the fact that it drops precision all over the place: a lot of tuning up is needed before it is the basis for a really strong redundant overflow check remover.

The technique I’m describing is not as widely known or as widely used as it should be, considering how easy and effective it is. The best answer is that C = [9216..10239].

UPDATE: The latest version of my LLVM dataflow checker also validates the results of computeKnownBits(), which tries to figure out which bits of a value must be either zero or one.

Here’s a bit of further reading about how these techniques (and others) were applied to the Frama-C static analyzer.

Efficient Integer Overflow Checking in LLVM

(Here’s some optional background reading material.)

We want fast integer overflow checking. Why? First, if the undefined behavior sanitizers go faster then testing goes faster. Second, when overhead drops below a certain point people will become willing to use UBSan to harden production code against integer overflows. This is already being done in parts of Android. It isn’t the kind of thing to do lightly: some of LLVM’s sanitizers, such as ASan, will increase an application’s attack surface. Even UBSan in trapping mode — which does not increase attack surface that I know of — could easily enable remote DoS attacks. But this is beside the point.

Let’s start with this function:

int foo(int x, int y) {
  return x + y; 
}

When compiled with trapping integer overflow checks (as opposed to checks that provide diagnostics and optionally continue executing) Clang 3.8 at -O2 gives:

foo:
        addl    %esi, %edi
        jo      .LBB0_1
        movl    %edi, %eax
        retq
.LBB0_1:
        ud2

This code is efficient; here Chandler Carruth explains why. On the other hand, signed integer overflow checking slows down SPEC CINT 2006 by 11.8% overall, with slowdown ranging from negligible (GCC, Perl, OMNeT++) to about 20% (Sjeng, H264Ref) to about 40% (HMMER).

Why do fast overflow checks add overhead? The increase in object code size due to overflow checking is less than 1% so there’s not going to be much trouble in the icache. Looking at HMMER, for example, we see that it spends >95% of its execution time in a function called P7Viterbi(). This function can be partially vectorized, but the version with integer overflow checks doesn’t get vectorized at all. In other words, most of the slowdown comes from integer overflow checks interfering with loop optimizations. In contrast, GCC and Perl probably don’t get much benefit from advanced loop optimizations in the first place, hence the lack of slowdown there.

Here I’ll take a second to mention that I had to hack SPEC slightly so that signed integer overflows wouldn’t derail my experiments. The changes appear to be performance-neutral. Only GCC, Perl, and H254Ref have signed overflows. Here’s the patch for SPEC CPU 2006 V1.2. All performance numbers in this post were taken on an i7-5820K (6-core Haswell-E at 3.3 GHz), running Ubuntu 14.04 in 64-bit mode, with frequency scaling disabled.

Now for the fun part: making code faster by removing overflow checks that provably don’t fire. At -O0 the SPEC INT 2006 benchmarks have 67,678 integer overflow checks, whereas at -O3 there are 38,527. So that’s nice: LLVM 3.8 can already get rid of 43% of naively inserted checks.

Let’s look at some details. First, the good news: all of the overflow checks in these functions are optimized away by LLVM:

int foo2(int x, int y) {
  return (short)x + (short)y;
}

int foo3(int x, int y) {
  return (long)x + (long)y;
}

int foo4(int x, int y) {
  return (x >> 1) + (y >> 1);
}

int32_t foo5(int32_t x, int32_t y) {
  const int32_t mask = ~(3U << 30);
  return (x & mask) + (y & mask);
}

int32_t foo6(int32_t x, int32_t y) {
  const int32_t mask = 3U << 30;
  return (x | mask) + (y | mask);
}

int32_t foo7(int32_t x, int32_t y) {
  const int32_t mask = 1U << 31;
  return (x | mask) + (y & ~mask);
}

See for yourself here. The common theme across these functions is that each overflow check can be seen to be unnecessary by looking at the high-order bits of its inputs. The code for these optimizations is here.

Now on the other hand, LLVM is unable to see that these functions don’t require overflow checks:

int foo8(int x) {
  return x < INT_MAX ? x + 1 : INT_MAX;
}

int foo9(int x, int y) {
  if (x < 10 && x > -10 && y < 10 && y > -10)
    return x + y;
  return 0;
}

Here’s another one where the check doesn’t get eliminated:

void foo10(char *a) {
  for (int i = 0; i < 10; i++)
    a[i] = 0;
}

There’s good news: Sanjoy Das has a couple of patches that, together, solve this problem. Their overall effect on SPEC is to reduce the overhead of signed integer overflow checking to 8.7%.

Finally, this function should get one overflow check rather than three:

unsigned foo11(unsigned *a, int i) {
  return a[i + 3] + a[i + 5] + a[i + 2];
}

This example (or something like it) is from Nadav Rotem on twitter; his redundant overflow check removal pass for Swift eliminates this sort of thing at the SIL level. I’ve done a bit of work on bringing these ideas to LLVM, and will hopefully have more to write about that later on.

In summary, signed integer overflow checks in LLVM are fast but they get in the way of the optimizers, which haven’t yet been systematically taught to see through them or eliminate them. There’s plenty of low-hanging fruit, and hopefully we can get the overhead down to the point where people can turn on overflow checking in production codes without thinking too hard about the tradeoffs.

Addenda:

I haven’t forgotten about Souper! We’ve taught it to eliminate integer overflow checks. I’ll write about that later too.

See Dan Luu’s article on this topic.