Advanced Compilers Weeks 3-5

This continues a previous post.

We went through the lattice theory and introduction to dataflow analysis parts of SPA. I consider this extremely good and important material, but I’m afraid that the students looked pretty bored. It may be the case that this material is best approached by first looking at practical aspects and only later going into the theory.

One part of SPA that I’m not super happy with is the material about combining lattices (section 4.3). This is a useful and practical topic but the use cases aren’t really discussed. In class we went through some examples, for example this function that cannot be optimized by either constant propagation or dead code elimination alone, but can be optimized by their reduced product: conditional constant propagation. Which, as you can see, is implemented by both LLVM and GCC. Also, this example cannot be optimized by either sign analysis or parity analysis, but can be optimized using their reduced product.

We didn’t go into them, but I pointed the class to the foundational papers for dataflow analysis and abstract interpretation.

I gave an assignment to implement subtract and bitwise-and transfer functions for the interval abstract domain for signed 5-bit integers. The bitwidth is small so I can rapidly do exhausive testing of students’ code. Their subtract had to be correct and maximally precise — about half of the class accomplished this. Their bitwise-and had to be correct and more precise than always returning top, and about half of the class accomplished this as well (a maximally precise bitwise-and operator for intervals is not at all easy — try it!). Since not everyone got the code right, I had them fix bugs (if any) and resubmit their code for this week. I hope everyone will get it right this time! Also I will give prizes to students whose bitwise-and operator is on the Pareto frontier (out of all submitted solutions) for throughput vs precision and code size vs precision. Here are the results with the Pareto frontier in blue and the minimum and maximum precision in red (narrower intervals are better).

Impressively, student k implemented an optimally precise bitwise-and transfer function! Student c’s transfer function returned an answer other than top only for intervals of width 1. Mine (labeled JOHN) looked at the number of leading zeroes in both operands.

We looked at the LLVM implementation of the bitwise domain (“known bits”, they call it) which lives in ValueTracking.cpp. This analysis doesn’t have a real fixpoint computation, it rather simply walks up the dataflow graph in a recursive fashion, which is a bit confusing since it is a forward dataflow analysis that looks at nodes in the backward direction. The traversal stops at depth 6, and isn’t cached, so the code is really very easy to understand.

We started to look at how LLVM works, I went partway through some lecture notes by David Chisnall. We didn’t focus on the LLVM implementation yet, but rather looked at the design, with a bit of focus on SSA, which is worth spending some time on since it forms the foundation for most modern compilers. I had the students read the first couple of chapters of this drafty SSA book.

Something I’d appreciate feedback on is what (besides SSA) have been the major developments in ahead-of-time compiler technology over the last 25 years or so. Loop optimizations and vectorization have seen major advances of course, as have verified compilers. In this class I want to steer clear of PL-level innovations.

Finally, former Utah undergrad and current Googler Chad Brubaker visited the class and gave a guest lecture on UBSan in production Android: very cool stuff! Hopefully this motivated the class to care about using static analysis to remove integer overflow checks, since they will be doing assignments on that very topic in the future.

Advanced Compilers Weeks 1 and 2

This post will be of somewhat narrow interest; it’s a quick attempt to take my lecture notes for the first weeks of an advanced compilers course and turn them into something a bit more readable. I’m not using slides for this class.

Motivation

The great thing about an advanced course (on any topic) is that we have a lot of freedom in choosing the direction that the class takes. My class this fall is mainly about static program analysis: predicting the behavior of programs without running them. This is a broadly useful technology, it is the foundation for type checking, program verification, compiler optimization, and static bugfinding.

We can start off with a couple of observations about the role of compilers. First, hardware is getting weirder rather than getting clocked faster: almost all processors are multicores and it looks like there is increasing asymmetry in resources across cores. Processors come with vector units, crypto accelerators, bit twiddling instructions, and lots of features to make virtualization and concurrency work. We have DSPs, GPUs, big.little, and Xeon Phi. This is only scratching the surface. Second, we’re getting tired of low-level languages and their associated security disasters, we want to write new code, to whatever extent possible, in safer, higher-level languages. Compilers are caught right in the middle of these opposing trends: one of their main jobs is to help bridge the large and growing gap between increasingly high-level languages and increasingly wacky platforms. It’s effectively a perpetual employment act for solid compiler hackers.

The sufficiently smart compiler never seems to arrive. I told the class a story that I never tire of re-telling. My understanding is that while the death of the Cell processor was complicated (yields were bad, GPUs were on the rise, etc.) the lack of good tooling certainly didn’t help. Perhaps later on we’ll read this paper.

Semantics

One of the big ideas that enables static program analysis is that programs mean something, mathematically speaking. Of course this was understood very early by the people who created computer science, but in the early history of compilers people would get tripped up by the fact that they didn’t necessarily have a good idea what the programs being compiled actually meant. A new optimization would break programs and it wasn’t possible to assign blame cleanly: was the program within its rights to expect a certain behavior or not? This kind of question can only be answered by assigning meaning to programs. Alas, it is still common for a program to mean “whatever the (single) language implementation does with the program.” I’ve heard stories from Matlab users that the providers of the Matlab implementation have introduced subtle changes to the semantics over time, probably both intentionally and unintentionally. The alternative to defining the semantics using an implementation is to define the semantics of a language some other way, either in a standards document or in math. Then, both programs and implementations can be judged to be either in conformance, or not, with the standard. Obviously this is no panacea, as long experience with C and C++ has shown — but it’s better than nothing.

There are a lot of ways to write down the semantics of a programming language but an even more important issue is creating an appropriate semantics. For example, a language designed for implementing constant-time cryptography might include execution time in the semantics. A language for embedded systems might include memory allocation (or at least guarantees about the lack of implicit allocations) in the semantics. Even the simple parts of a language, such as arithmetic, contain many subtle corners. Here’s an example. We can also look at the behavior of shift operators when the shift exponent is at least as large as the width of the shifted value. Java and x86 reduce the shift amount modulo 32. ARM reduces the shift amount modulo 256 and then saturates (shift by 257 is equivalent to shift by 1 but shift by 100 is equivalent to clearing the register). C and C++ have (of course!) undefined semantics for shift by 100 or 257. Constraining the semantics is nice but too many constraints make efficient code generation difficult. The WebAsm people were discussing these issues not too long ago. I’ve always wanted shift left by -3 to be a shift right by 3, but nobody else has ever thought this was a good idea, as far as I know.

The recent DAO debacle provided an absolutely wonderful demonstration of why it might be risky to define the semantics of a language using a reference implementation. They put a lot of money on the line there, the hubris was impressive. One hopes that lessons were learned.

The overall point of this discussion is (1) we can’t do static program analysis unless we know what the programming language means and (2) designing meanings for programs is an interesting and difficult topic in itself.

Missed Optimizations

I asked the students to use the Compiler Explorer to demonstrate a case in which each of GCC and LLVM miss an optimization, and to provide the assembly code that the compiler should have generated. We went over a handful of submissions, discussing the issues: Was the proposed optimization correct? Would it be a good idea to implement it now? What kind of static analysis would be needed to make the optimization go?

As I had hoped, the codes written by the students exposed many interesting issues. One example that came up was similar to this one where LLVM cleverly realizes that the loop is squaring the function but then (apparently) fails to remove the subsequent conditional move. But really, since the loop fails to execute when the argument is negative, some sort of conditional really is needed. We also saw some good examples where potential aliasing was blocking optimizations. Playing with optimizations in compiler explorer is really a pleasure.

Intro to Static Analysis

Although there are a lot of slide decks that do a good job explaining static analysis, there’s only one book-length treatment of the subject that I like, which I’ll call SPA. SPA is clearly written, it avoids unnecessary notation, and it keeps the material grounded in practical use cases. It’s great!

I started out using everyone’s favorite tutorial abstract domains: parity (are values even or odd?) and signs (are values negative, zero, or positive?). I introduced what I consider to be the first key idea behind static analysis, which is that abstract values (odd, positive, etc.) are simply stand-ins for sets of concrete values. This is such a simple idea and yet it can get lost if the material is presented wrong. We discussed some transfer functions such as addition for the even/odd domain and multiplication for the signedness domain (as seen on p. 28 of SPA). Here the key idea is that we can always verify a transfer function by concretizing the abstract arguments, applying the concrete operation pairwise to the sets of concrete values (assuming a binary operator), and then lifting the result set back into the abstract domain. This now sets the stage for introducing the abstraction and concretization functions and then we’re ready for the Galois connection (which I showed the components of but did not explicitly name). David Schmidt’s slides on this material are awesome.

The thing that we’re working up to here is digging into some of the numerous static analyses that are part of LLVM. I’m trying to introduce the theory, which is very beautiful, while also warming the students up to the idea that it all sort of goes out the window when you’re confronted with the piles of C++ that actually make these analyses happen in practice.

Types

Everyone read Chapter 3 of SPA as well as the first section of Type Systems, another piece of writing that I like very much because it keeps the topics connected to the reasons why they are useful. I didn’t want to get into type systems too deeply (and in fact types are something of a non-speciality of mine) but did want to students to come away with the idea that type checking is an important use case of static program analysis.

The point of static typechecking is that “well typed programs can’t go wrong” but as Cardelli points out in some detail, we need to be pretty careful when saying what “go wrong” means. He includes some nice discussion of the standard static/dynamic and safe/unsafe language categorizations.

Solutions to Integer Overflow

Humans are typically not very good at reasoning about integers with limited range, whereas computers fundamentally work with limited-range numbers. This impedance mismatch has been the source of a lot of bugs over the last 50 years. The solution comes in multiple parts.

In most programming languages, the default integer type should be a bignum: an arbitrary-precision integer that allocates more space when needed. Efficient bignum libraries exist and most integers never end up needing more than one machine word anyway, except in domains like crypto. As far as I’m concerned, for ~95% of programming tasks integer overflow is a solved problem: it should never happen. The solution isn’t yet implemented widely enough, but happily there are plenty of languages such as Python that give bignums by default.

When performance and/or predictability is a major consideration, bignums won’t work and we’re stuck with fixed-width integers that wrap, trap, saturate, or trigger undefined behavior upon overflow. Saturation is a niche solution that we won’t discuss further. Undefined behavior is bad but at least it enables a few loop optimizations and also permits trapping implementations. Although wrapping is an extremely poor default, there are a few good things to say about it: wrapping is efficient, people have come to expect it, and it is a good match for a handful of application domains.

Swift is a modern programming language that traps instead of providing bignums, this is also a generally sensible behavior. Why not bignums? The About Swift web page says that Swift gives “the developer the control needed in a true systems programming language,” so perhaps the designers were worried about unpredictable allocations. I’d love to see a study of the performance of best-of-breed trapping and bignum implementations on important modern applications.

The Rust developers have adopted a hybrid solution where integer overflows trap in debug builds and wrap in optimized builds. This is pragmatic, especially since integer overflows do not compromise Rust’s memory safety guarantees. On the other hand, perhaps as MIR matures, Rust will gravitate towards checking in optimized builds.

For safety-critical programs, the solution to integer overflow is to prove that it cannot happen using some combination of manual reasoning, testing, and formal verification. SPARK Ada and the TrustInSoft analyzer are suitable for proving that integer overflows won’t occur. More work is needed to make this sort of verification scalable and less expert-intensive.

Systems programming tasks, such as building operating systems, language runtimes, and web browsers, are caught in the middle. Wrapping sucks, bignums and trapping are slow or at least perceived as slow (and you do not want to trap or allocate while handling a hardware interrupt anyway), and the codes are too large for formal verification and thorough testing. One answer is to work hard on making trapping fast. For example, Swift has a high-level optimization pass specifically for removing integer overflow checks, and then the LLVM optimization passes do more of this, and then the LLVM backends can lower checked math operations to efficient condition code checks, and then modern Intel processors fuse the resulting branch-on-overflow instructions away.

In summary, bignums should be the default whenever this is feasible, and trapping on overflow should be the backup default behavior. Continued work on the compilers and processors will ensure that the overhead of trapping overflow checks is down in the noise. Java-style wrapping integers should never be the default, this is arguably even worse than C and C++’s UB-on-overflow which at least permits an implementation to trap. In domains where wrapping, trapping, and allocation are all unacceptable, we need to be able to prove that overflow does not occur.

I’ll end up with a few random observations:

  • Dan Luu wrote a piece on the overhead of overflow checking.
  • Arbitrary (fixed) width bitvectors are a handy datatype and I wish more languages supported them. These can overflow but it’s not as big of a deal since we choose the number of bits.
  • Explicitly ranged integers as seen in Ada are also very nice, there’s no reason that traps should only occur at the 32-bit or 64-bit boundaries.
  • The formal verification community ignored integer overflow for far too long, there’s a long history of assuming that program integers behave like mathematical integers. Things are finally better though.

UPDATE: I didn’t want this piece to be about C and C++ but I should have clarified that it is only signed overflow in these languages that is undefined behavior; unsigned overflow is defined to be two’s complement wraparound. While it is possible to trap on unsigned overflow — UBSan has a flag that turns on these traps — this behavior does not conform to the standards. Even so, trapping unsigned wraparounds can — in some circumstances — be useful for finding software defects. The question is whether the wraparound was intentional or not.