Taming Undefined Behavior in LLVM

Earlier I wrote that Undefined Behavior != Unsafe Programming, a piece intended to convince you that there’s nothing inherently wrong with undefined behavior as long as it isn’t in developer-facing parts of the system.

Today I want to talk about a new paper about undefined behavior in LLVM that’s going to be presented in June at PLDI 2017. I’m an author of this paper, but not the main one. This work isn’t about debating the merits of undefined behavior, its goal is to describe and try to fix some unintended consequences of the design of undefined behavior at the level of LLVM IR.

Undefined behavior in C and C++ is sort of like a bomb: either it explodes or it doesn’t. We never try to reason about undefined programs because a program becomes meaningless once it executes UB. LLVM IR contains this same kind of UB, which we’ll call “immediate UB.” It is triggered by bad operations such as an out-of-bounds store (which is likely to corrupt RAM) or a division by zero (which may cause the processor to trap).

Our problems start because LLVM also contains two kinds of “deferred UB” which don’t explode, but rather have a contained effect on the program. We need to reason about the meaning of these “slightly undefined” programs which can be challenging. There have been long threads on the LLVM developers’ mailing list going back and forth about this.

The first kind of deferred UB in LLVM is the undef value that acts like an uninitialized register: an undef evaluates to an arbitrary value of its type. Undef is useful because sometimes we want to say that a value doesn’t matter, for example because we know a location is going to be over-written later. If we didn’t have something like undef, we’d be forced to initialize locations like this to specific values, which costs space and time. So undef is basically a note to the compiler that it can choose whatever value it likes. During code generation, undef usually gets turned into “whatever was already in the register.”

Unfortunately, the semantics of undef don’t justify all of the optimizations that we’d like to perform on LLVM code. For example, consider this LLVM function:

define i1 @f(i32) {
  %2 = add nsw i32 %0, 1
  %3 = icmp sgt i32 %2, %0
  ret i1 %3
}

This is equivalent to “return x+1 > x;” in C and we’d like to be able to optimize it to “return true;”. In both languages the undefinedness of signed overflow needs to be recognized to make the optimization go. Let’s try to do that using undef. In this case the semantics of “add nsw” are to return undef if signed overflow occurs and to return the mathematical answer otherwise. So this example has two cases:

  1. The input is not INT_MAX, in which case the addition returns input + 1.
  2. The input is INT_MAX, in which case the addition returns undef.

In case 1 the comparison returns true. Can we make the comparison true for case 2, giving us the overall result that we want? Recall that undef resolves as an arbitrary value of its type. The compiler is allowed to choose this value. Alas, there’s no value of type i32 that is larger than INT_MAX, when we use a signed comparison. Thus, this optimization is not justified by the semantics of undef.

One choice we could make is to give up on performing this optimization (and others like it) at the LLVM level. The choice made by the LLVM developers, however, was to introduce a second, stronger, form of deferred UB called poison. Most instructions, taking a poison value on either input, evaluate to poison. If poison propagates to a program’s output, the result is immediate UB. Returning to the “x + 1 > x” example above, making “add nsw INT_MAX, 1” evaluate to poison allows the desired optimization: the resulting poison value makes the icmp also return poison. To justify the desired optimization we can observe that returning 1 is a refinement of returning poison. Another way to say the same thing is that we’re always allowed to make code more defined than it was, though of course we’re never allowed to make it less defined.

The most important optimizations enabled by deferred undefined behavior are those involving speculative execution such as hoisting loop-invariant code out of a loop. Since it is often difficult to prove that a loop executes at least once, loop-invariant code motion threatens to take a defined program where UB sits inside a loop that executes zero times and turn into into an undefined program. Deferred UB lets us go ahead and speculatively execute the code without triggering immediate UB. There’s no problem as long as the poisonous results don’t propagate somewhere that matters.

So far so good! Just to be clear: we can make the semantics of an IR anything we like. There will be no problem as long as:

  1. The front-ends correctly refine C, C++, etc. into IR.
  2. Every IR-level optimization implements a refinement.
  3. The backends correctly refine IR into machine code.

The problem is that #2 is hard. Over the years some very subtle mistakes have crept into the LLVM optimizer where different developers have made different assumptions about deferred UB, and these assumptions can work together to introduce bugs. Very few of these bugs can result in end-to-end miscompilation (where a well-formed source-level program is compiled to machine code that does the wrong thing) but even this can happen. We spent a lot of time trying to explain this clearly in the paper and I’m unlikely to do better here! But the details are all there in Section 3 of the paper. The point is that so far these bugs have resisted fixing: nobody has come up with a way to make everything consistent without giving up optimizations that the LLVM community is unwilling to give up.

The next part of the paper (Sections 4, 5, 6) introduces and evaluates our proposed fix, which is to remove undef, leaving only poison. To get undef-like semantics we introduce a new freeze instruction to LLVM. Freezing a normal value is a nop and freezing a poison value evaluates to an arbitrary value of the type. Every use of a given freeze instruction will produce the same value, but different freezes may give different values. The key is to put freezes in the right places. My colleagues have implemented a fork of LLVM 4.0 that uses freeze; we found that it more or less doesn’t affect compile times or the quality of the generated code.

We are in the process of trying to convince the LLVM community to adopt our proposed solution. The change is somewhat fundamental and so this is going to take some time. There are lots of details that need to be ironed out, and I think people are (rightfully) worried about subtle bugs being introduced during the transition. One secret weapon we have is Alive where Nuno has implemented the new semantics in the newsema branch and we can use this to test a large number of optimizations.

Finally, we noticed that there has been an interesting bit of convergent evolution in compiler IRs: basically all heavily optimizing AOT compilers (including GCC, MSVC, and Intel CC) have their own versions of deferred UB. The details differ from those described here, but the effect is the same: deferred UB gives the compiler freedom to perform useful transformations that would otherwise be illegal. The semantics of deferred UB in these compilers has not, as far as we know, been rigorously defined and so it is possible that they have issues analogous to those described here.

Fun at the UNIX Terminal Part 1

This post is aimed at kids, like the 6th graders who I was recently teaching about programming in Python. It is more about having fun than about learning, but I hope that if you enjoy playing around at the UNIX terminal, you’ll eventually learn to use this kind of system for real. Keep in mind this immortal scene from Jurassic Park.

To run the commands in this post, you’ll need a UNIX machine: either Linux or Mac OS X will work. You’ll also need the ability to install software. There are two options:

  • Install precompiled binaries using a package manager, I’ll give command lines for Homebrew on OS X and for Apt on Ubuntu Linux. You’ll need administrator access to run Apt or to install Homebrew, but you do not need administrator access to install packages after Homebrew has been installed. Other versions of Linux have their own package managers and they are all pretty easy to use.
  • Build a program from source and install it in your home directory. This does not require administrator access but it’s more work and I’m not going to go into the details, though I hope to do this in a later post.

ROT13 using tr

The tr utility should be installed by default on an OS X or Ubuntu machine. It translates the characters in a string into different characters according to rules that you provide. To learn more about tr (or any other command in this post) type this command (without typing the dollar sign):

$ man tr

This will show you the UNIX system’s built-in documentation for the command.

In this and subsequent examples, I’ll show text that you should type on a line starting with a dollar sign, which is the default UNIX prompt. Text printed by the system will be on lines not starting with a dollar sign.

We’re going to use tr to encrypt some text as ROT13, which simply moves each letter forward in the alphabet by 13 places, wrapping around from Z to A if necessary. Since there are 26 letters, encrypting twice using ROT13 gives back the original text. ROT13 is fun but you would not want to use it for actual secret information since it is trivial to decrypt. It is commonly used to make it hard for people to accidentally read spoilers when discussing things like movie plot twists.

Type this:

$ echo 'Hello this is a test' | tr 'A-Za-z' 'N-ZA-Mn-za-m'
Uryyb guvf vf n grfg

Now to decrypt:

$ echo 'Uryyb guvf vf n grfg' | tr 'A-Za-z' 'N-ZA-Mn-za-m'
Hello this is a test

Just two more things before moving on to the next command.

First, the UNIX pipe operator (the “|” character in the commands above, which looks a little bit like a piece of pipe) is plumbing for UNIX commands: it “pipes” the output of one command to the input of a second command. We’ll be using it quite a bit.

Second, how exactly did we tell tr to implement ROT13? Well, the first argument, ‘A-Za-z’, gives it a set of characters to work with. Here A-Z stands for A through Z and a-z stands for a through z (computers treat the capital and lowercase versions of letters as being separate characters). So we are telling tr that it is going to translate any letter of the alphabet and leave any other character (spaces, punctuation, numbers, etc.) alone. The second argument to tr, ‘N-ZA-Mn-za-m’, specifies a mapping for the characters in the first argument. So the first character in the first argument (A) will be translated to the first character of the second argument (N), and so on. We could just as easily use tr to put some text in all uppercase or all lowercase, you might try this as an exercise.

fortune

Tragically, this command isn’t installed by default on a Mac or on an Ubuntu Linux machine. On a Mac you can install it like this:

brew install fortune

If this doesn’t work then you need to get Homebrew setup, try this page.

On Ubuntu try this:

sudo apt-get install fortune

The “sudo” command will ask you to enter your password before running the next command, apt-get, with elevated privileges, in order to install the fortune program in a system directory that you are normally not allowed to modify. This will only work if your machine has been specifically configured to allow you to run programs with elevated privileges.

In any case, if you can’t get fortune installed, don’t worry about it, just proceed to the next command.

Fortune randomly chooses from a large collection of mildly humorous quotes:

$ fortune 
I have never let my schooling interfere with my education.
		-- Mark Twain

say

This command is installed by default on a Mac; on Ubuntu you’ll need to type “sudo apt install gnustep-gui-runtime”.

Type this:

$ say "you just might be a genius"

Make sure you have sound turned up.

The Linux say command, for whatever reason, requires its input to be a command line argument, so we cannot use a pipe to send fortune’s output to say. So this command will not work on Linux (though it does work on OS X):

$ fortune | say

However, there’s another trick we can use: we can turn the output of one command into the command-line arguments for another command by putting the first command in parentheses and prefixing this with a dollar sign. So this will cause your computer to read a fortune aloud:

$ say $(fortune)

Another way to accomplish the same thing is to put fortune’s output into a file and then ask say to read the file aloud:

$ fortune > my_fortune.txt
$ say -f my_fortune.txt

Here the greater-than symbol “redirects” the output of fortune into a file. Redirection works like piping but the output goes to a file instead of into another program. It is super useful.

If you run “say” on both a Linux box and a Mac you will notice that the Mac’s speech synthesis routines are better.

cowsay

The extremely important cowsay command uses ASCII art to show you a cow saying something. Use it like this:

$ fortune | cowsay
 __________________________________ 
/ What is mind? No matter. What is \
| matter? Never mind.              |
|                                  |
\ -- Thomas Hewitt Key, 1799-1875  /
 ---------------------------------- 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

Both Homebrew and Apt have a package called “cowsay” that you can install using the same kind of command line you’ve already been using.

Cowsay has some exciting options, such as “-d” which makes the cow appear to be dead:

$ fortune | cowsay -d
 _________________________________________ 
/ Laws are like sausages. It's better not \
| to see them being made.                 |
|                                         |
\ -- Otto von Bismarck                    /
 ----------------------------------------- 
        \   ^__^
         \  (xx)\_______
            (__)\       )\/\
             U  ||----w |
                ||     ||

Use “man cowsay” to learn more.

ponysay

Don’t install ponysay unless you feel that cowsay is too restrictive. Also, it isn’t available as a precompiled package. You can build it from source code by first installing the “git” package using apt-get or brew and then running the following commands:

$ git clone https://github.com/erkin/ponysay.git
$ cd ponysay
$ ./setup.py --freedom=partial --private install

This procedure puts ponysay into an odd location, but whatever. Here (assuming Linux, on a Mac you’ll need to pipe a different command’s output to ponysay) a cute pony tells us the prime factorization of a number:

figlet

Figlet (actually called FIGlet but that’s not what you type to run the command) prints text using large letters comprised of regular terminal characters. For example:

$ whoami | figlet
                     _          
 _ __ ___  __ _  ___| |__  _ __ 
| '__/ _ \/ _` |/ _ \ '_ \| '__|
| | |  __/ (_| |  __/ | | | |   
|_|  \___|\__, |\___|_| |_|_|   
          |___/                 

Figlet has lots of options for controlling the appearance of its output. For example, you can change the font:

$ echo 'hello Eddie' | figlet -f script
 _          _   _          ___                    
| |        | | | |        / (_)   |     |  o      
| |     _  | | | |  __    \__   __|   __|      _  
|/ \   |/  |/  |/  /  \_  /    /  |  /  |  |  |/  
|   |_/|__/|__/|__/\__/   \___/\_/|_/\_/|_/|_/|__/

Another command, toilet, is similar to figlet but has even more options. Install both of these programs using the same kinds of commands we’ve already been using to talk to package managers.

lolcat

The UNIX “cat” program prints a file, or whatever is piped into it, to the terminal. Lolcat is similar but it prints the text in awesome colors:

bb

The bb program doesn’t seem to be available from Homebrew, but on Ubuntu you can install it using “sudo apt-get install bb”. It is a seriously impressive ASCII art demo.

rig

You know how lots of web sites want you to sign up using your name and address, but your parents hopefully have trained you not to reveal your identity online? Well, the rig utility can help, it creates a random identity:

$ rig
Juana Waters
647 Hamlet St
Austin, TX  78710
(512) xxx-xxxx

The zip codes and telephone area codes are even correct. For some reason rig will never generate an identity that lives in Utah.

bc

The bc program is a calculator but unlike almost every other calculator you use, it can handle numbers of unlimited size (or, more precisely, numbers limited only by the amount of RAM in your computer) without losing precision. Try this:

$ echo '2 ^ 100' | bc
1267650600228229401496703205376

Unfortunately bc does not have a built-in factorial function but you can write one easily enough using bc’s built-in programming language. Start bc in interactive mode (this will happen by default if you don’t pipe any text into bc) by just typing “bc”. Then enter this code:

define fact(n) {
  if (n < 2) return 1;
  return n * fact(n - 1);
}

Now you can compute very large factorials:

fact(1000)
40238726007709377354370243392300398571937486421071463254379991042993\
85123986290205920442084869694048004799886101971960586316668729948085\
58901323829669944590997424504087073759918823627727188732519779505950\
99527612087497546249704360141827809464649629105639388743788648733711\
91810458257836478499770124766328898359557354325131853239584630755574\
09114262417474349347553428646576611667797396668820291207379143853719\
58824980812686783837455973174613608537953452422158659320192809087829\
73084313928444032812315586110369768013573042161687476096758713483120\
25478589320767169132448426236131412508780208000261683151027341827977\
70478463586817016436502415369139828126481021309276124489635992870511\
49649754199093422215668325720808213331861168115536158365469840467089\
75602900950537616475847728421889679646244945160765353408198901385442\
48798495995331910172335555660213945039973628075013783761530712776192\
68490343526252000158885351473316117021039681759215109077880193931781\
14194545257223865541461062892187960223838971476088506276862967146674\
69756291123408243920816015378088989396451826324367161676217916890977\
99119037540312746222899880051954444142820121873617459926429565817466\
28302955570299024324153181617210465832036786906117260158783520751516\
28422554026517048330422614397428693306169089796848259012545832716822\
64580665267699586526822728070757813918581788896522081643483448259932\
66043367660176999612831860788386150279465955131156552036093988180612\
13855860030143569452722420634463179746059468257310379008402443243846\
56572450144028218852524709351906209290231364932734975655139587205596\
54228749774011413346962715422845862377387538230483865688976461927383\
81490014076731044664025989949022222176590433990188601856652648506179\
97023561938970178600408118897299183110211712298459016419210688843871\
21855646124960798722908519296819372388642614839657382291123125024186\
64935314397013742853192664987533721894069428143411852015801412334482\
80150513996942901534830776445690990731524332782882698646027898643211\
39083506217095002597389863554277196742822248757586765752344220207573\
63056949882508796892816275384886339690995982628095612145099487170124\
45164612603790293091208890869420285106401821543994571568059418727489\
98094254742173582401063677404595741785160829230135358081840096996372\
52423056085590370062427124341690900415369010593398383577793941097002\
77534720000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000000000000000000000\
0000000000000000000000000000000000000000000000000000

While we're at it, you should figure out why the factorial of any large number contains a lot of trailing zeroes.

Conclusion

We've only scratched the surface, I'll share more entertaining UNIX commands in a followup post. Some of these programs I hadn't even heard of until recently, a bunch of people on Twitter gave me awesome suggestions that I've used to write this up. If you want to see a serious (but hilariously outdated) video about what you can do using the UNIX command line, check out this video in which one of the original creators of UNIX shows how to build a spell checker by piping together some simple commands.

Do Expressive Programming Languages Always Have Undefined Behavior?

In the Hacker News comments on one of my previous posts about undefined behavior, someone said this:

AFAIK Gödel’s incompleteness theorems imply that _any_ language will have at least some undefined behaviour.

Let’s take a quick look at this statement, keeping in mind that incompleteness and undecidability can be remarkably tricky topics. Some years ago I read and enjoyed Gödel’s Theorem: An Incomplete Guide to Its Use and Abuse (and no doubt need to reread it).

First off, it is clear that there exist programming languages that are free of UB, such as one where the semantics of every program is to print “7”. Whatever it is that UB means (we have not formally defined it), it seems clear that the language that always prints “7” does not have it.

There are also useful languages that are obviously UB-free, such as an expression language that evaluates elementary functions over IEEE floats. These languages are particularly easy to reason about because they are not Turing-complete: all computations terminate and we simply have to ensure that they terminate with a defined result.

In contrast, the HN commenter may have intended to invoke Rice’s Theorem: “Any nontrivial property about the language recognized by a Turing machine is undecidable.” A consequence is that when f() is some arbitrary computation we cannot in general hope to decide whether a program like this invokes undefined behavior:

main() {
  f();
  return 1 / 0; // UB!
}

But this is a red herring. Rice’s Theorem only applies to non-trivial properties: “properties that apply neither to no programs nor all programs in the language.” To sidestep it, we only need to define a programming language where UB-freedom is a trivial property. This is done by ensuring that every operation that a program can perform is a total function: it is defined in all circumstances. Thus, programs in this language will either terminate in some defined state or else fail to terminate. This kind of extremely tight specification is not typically done for realistic programming languages because it is a lot of work, particularly if the language has open-ended interactions with other levels of the system, such as inline assembly. But the problem is only a practical one; there is no problem in principle. It is not too difficult to write an UB-free specification for a (Turing-complete) toy language or subset of a real language.

Now let’s return to the original HN comment: it was about the incompleteness theorems, not about the halting problem. I’m not sure what to do with that, as I don’t see that Gödel’s theorems have any direct bearing on undefined behavior in programming languages.

Mapping Mountains on Exoplanets

Snow in the mountains evolves in predictable ways that are strongly influenced by the terrain. For example, warmish snowstorms leave a characteristic snow line that looks very much like a contour on a topographic map:

(source)

(source)

On the other hand, when snow melts the effect is very different, and is often dominated by aspect. There pictures taken near my house a few days apart show more melting on south-facing slopes that receive more solar energy:

Effects like these are taken into account by a snowpack model: a computer simulation designed to predict snowpack evolution in order to, for example, make predictions about avalanches or about spring runoff. Runoff is important not only because it can cause flooding but also because areas near the mountains depend on the runoff for drinking and agricultural water. A snowpack model would typically take topographic information and snow depth information as inputs, but people have also done research into driving snowpack models using weather information; this is useful for areas where there are not a lot of snow depth measurement stations, such as the Snotel network in the western USA.

This post asks the question: could we use images of snowpacks from above to create topographic maps of exoplanets? Obviously this requires telescopes with vastly improved angular resolution, these will likely be optical interferometers in space, or perhaps on the moon. Resolution can be increased by making the telescope’s baseline longer and light-gathering can be improved using larger mirrors or by integrating over longer time periods (planets don’t evolve rapidly, I don’t see why early exoplanet images wouldn’t be aggregated from weeks or months of observations). Space telescopes involving multiple satellites are going to be expensive and there have already been some canceled missions of this type. I haven’t checked their math but this article describes the collecting area and baseline required to get images of different resolutions; unfortunately it doesn’t mention what assumptions were made about exposure times to get the images.

Assuming we can image a snowy exoplanet in sufficient detail, how can we determine its topography? We do it by making a snowpack model for the planet, probably involving a lot of educated guesses, and then invert it: instead of parameterizing the model with the topography, we instead search for the topography that is most likely to explain the observations. We could easily prototype this mapping software right now; it can be validated in two ways. First, we can take archived satellite photos of Earth, degrade them using reasonable assumptions about telescope performance and exoplanet distance, and then use them to infer the topography. Of course we have good ground truth for Earth so validation is nearly trivial. The second way to validate this mapping method is using simulated mountainous, snowy exoplanets. This would be useful to check the robustness of the methods for less-Earthlike bodies. A prototype of these ideas seems very much in reach for an ambitious PhD student.

I tried to find previous work on this topic and didn’t turn up much, I would appreciate links either in a comment here or in private email if you know of something. Let’s take a quick look at why most of the ways that we get topographic data won’t work for exoplanets:

  • The original way to make maps, getting out there and surveying the land, obviously doesn’t apply.
  • Photogrammetry requires parallax.
  • Active sensing using radar or a laser is infeasible.

On the other hand, inferring mountains’ shape from changing light and shadows over the course of a day and from day to day (if the planet’s axis is tilted) will work, and no doubt this is how we’ll first know that an exoplanet is mountainous. Snow-based techniques could be used to refine these results.

Teaching Python Informally to Kids

For the last few months I’ve been running a “coding club” for my son’s sixth-grade class. Once a week, the interested students (about 2/3 of the class) stick around for an hour after school and I help them learn to program. The structure is basically like the lab part of a programming class, without the lecture. I originally had planned to do some lecturing but it soon became clear that at the end of a full day of school (these kids get on the bus around 7:00am, ugh) there was very little attention span remaining. So instead I decided to give them a sheet of problems to work on each week and I spend the hour walking around helping whoever needs help.

One of my main goals is to avoid making anyone hate programming. There are three parts to this. First, I’m not forcing them to do anything, but rather providing suggestions and guidance. Second, there’s no assessment going on. I’ve never been too comfortable with the grading part of teaching, actually. It can really get in the way of learning. Third, I’m encouraging them to work together. I’ve long noticed (starting when I was a CS student) that most learning occurs between students in the lab. Not as much learning happens in the lecture hall.

For a curriculum, we started out with turtle graphics, which I’ve always found to be wonderful. Python’s turtle library is clunkier than Logo and also is abysmally slow, but the advantages (visual debugging, many kids enjoy graphics) outweigh the problems. Fractals (Sierpinski triangle, dragon curve) were a good way to introduce recursion. We spent three or four weeks doing turtle stuff before moving on to simple crypto, which didn’t work as well. On the other hand, the students did really well with mathy code, here’s the handout from one of the math weeks.

Some observations:

  • One of my favorite moments so far has been when a couple of students implemented rot13 functions and used it to send rude messages to each other.
  • It’s pretty important to take a snack break.
  • Although the rockstar / 10x programmer idea is out of favor, it is absolutely clear that there is a huge amount of variation (much more than 10x) in how easily a group of twenty 10-11 year olds learn programming. Some kids are teaching themselves to open files and parse text while others are stuck on basic syntax issues.
  • Syntax, which computer professionals more or less learn to overlook, is hugely important.
  • I’ve always disliked Python’s significant whitespace and watching kids struggle with it has made me hate it even more.

Anyhow, this has been a fun teaching exercise and hopefully it is benefiting the kids.

Undefined Behavior != Unsafe Programming

Undefined behavior (UB) in C and C++ is a clear and present danger to developers, especially when they are writing code that will execute near a trust boundary. A less well-known kind of undefined behavior exists in the intermediate representation (IR) for most optimizing, ahead-of-time compilers. For example, LLVM IR has undef and poison in addition to true explodes-in-your-face C-style UB. When people become aware of this, a typical reaction is: “Ugh, why? LLVM IR is just as bad as C!” This piece explains why that is not the correct reaction.

Undefined behavior is the result of a design decision: the refusal to systematically trap program errors at one particular level of a system. The responsibility for avoiding these errors is delegated to a higher level of abstraction. For example, it is obvious that a safe programming language can be compiled to machine code, and it is also obvious that the unsafety of machine code in no way compromises the high-level guarantees made by the language implementation. Swift and Rust are compiled to LLVM IR; some of their safety guarantees are enforced by dynamic checks in the emitted code, other guarantees are made through type checking and have no representation at the LLVM level. Either way, UB at the LLVM level is not a problem for, and cannot be detected by, code in the safe subsets of Swift and Rust. Even C can be used safely if some tool in the development environment ensures that it will not execute UB. The L4.verified project does exactly this.

The essence of undefined behavior is the freedom to avoid a forced coupling between error checks and unsafe operations. The checks, once decoupled, can be optimized, for example by being hoisted out of loops or eliminated outright. The remaining unsafe operations can be — in a well-designed IR — mapped onto basic processor operations with little or no overhead. As a concrete example, consider this Swift code:

func add(a : Int, b : Int)->Int {
  return (a & 0xffff) + (b & 0xffff);
}

Although a Swift implementation must trap on integer overflow, the compiler observes that overflow is impossible and emits this LLVM IR:

define i64 @add(i64 %a, i64 %b) {
  %0 = and i64 %a, 65535
  %1 = and i64 %b, 65535
  %2 = add nuw nsw i64 %0, %1
  ret i64 %2
}

Not only has the checked addition operation been lowered to an unchecked one, but also the add instruction has been marked with LLVM’s nsw and nuw attributes, indicating that both signed and unsigned overflow are undefined. In isolation these attributes provide no benefit, but they may enable additional optimizations after this function is inlined. When the Swift benchmark suite is compiled to LLVM, about one in eight addition instructions has an attribute indicating that integer overflow is undefined.

In this particular example the nsw and nuw attributes are redundant since an optimization pass could re-derive the fact that the add cannot overflow. However, in general these attributes and others like them add real value by avoiding the need for potentially expensive static analyses to rediscover known program facts. Also, some facts cannot be rediscovered later, even in principle, since information is lost at some compilation steps.

In summary, undefined behavior in programmer-visible abstractions represents an aggressive and dangerous tradeoff: it sacrifices program correctness in favor of performance and compiler simplicity. On the other hand, UB at lower levels of the system, such as machine code or a compiler IR, is an internal design choice that needn’t have any effect on the programmer-facing parts of the system. This kind of UB simply requires us to accept that safety checks can be usefully factored out of their corresponding unsafe operations to afford efficient execution.

Detecting Strict Aliasing Violations in the Wild

Type-based alias analysis, where pointers to different types are assumed to point to distinct objects, gives compilers a simple and effective way to disambiguate memory references in order to generate better code. Unfortunately, C and C++ make it easy for programmers to violate the assumptions upon which type-based alias analysis is built. “Strict aliasing” refers to a collection of rules in the C and C++ standards that restrict the ways in which you are allowed to modify and look at memory objects, in order to make type-based alias analysis work in these weakly-typed languages. The problem is that the strict aliasing rules contain tricky and confusing corner cases and also that they rule out many idioms that have historically worked, such as using a pointer type cast to view a float as an unsigned, in order to inspect its bits. Such tricks are undefined behavior. See the first part of this post for more of an introduction to these issues.

The purpose of this piece is to call your attention to a new paper, Detecting Strict Aliasing Violations in the Wild, by Pascal Cuoq and his colleagues. C and C++ programmers should read it. Sections 1 and 2 introduce strict aliasing, they’re quick and easy.

Section 3 shows what compilers think about strict aliasing problems by looking at how a number of C functions get translated to x86-64 assembly. This material requires perseverance but it is worth taking the time to understand the examples in detail, because compilers apply the same thinking to real programs that they apply to tiny litmus tests.

Section 4 is about a new tool, built as part of Trust-in-Soft’s static analyzer, that can diagnose violations of the strict aliasing rules in C code. As the paper says, “it works best when applied on definite inputs,” meaning that the tool should be used as an extended checker for tis-interpreter. Pascal tells me that a release containing the strict aliasing checker is planned, but the time frame is not definite. In any case, readers interested in strict aliasing, but not specifically in tools for dealing with it, can skip this section.

Section 5 applies the strict aliasing checker to open source software. This is good reading because it describes problems that are very common in the wild today. Finding a bug in zlib was a nice touch: zlib is small and has already been looked at closely. Some programs mitigate these bugs by asking the compiler to avoid enforcing the strict aliasing rules: LLVM, GCC, and Intel CC all take a -fno-strict-aliasing flag and MSVC doesn’t implement type-based alias analysis at all. Many other programs contain time bombs: latent UB bugs that don’t happen to be exploited now, that might be exploited later when the compiler becomes a bit brighter.

Also see libcrunch and its paper and a paper about SafeType.