One Day of Winter and Three Days of Spring in the Fins

Bill and I had already made two attempts to backpack into the Fins area, which is part of Ernie’s Country in the Maze District of Canyonlands NP. Spring 2010 was abnormally snowy and Fall 2010 featured torrential rains, both times making roads impassable. This time — Spring 2011 — the weather cooperated. Also it was great to have John Veranth along; he has awesome hand-annotated maps of the area and a second truck provided some margin against mechanical problems or stuck vehicles.

First Day

We left SLC around 6:15 AM and had 3.5 hours of driving before leaving pavement about 20 miles north of Hanksville. Incredibly, at this point we still had 3.5 hours of driving left. Big lobes of sand had blown across the road in places and by the time we got to the ranger station at Hans Flat, the wind was really going and (as the pictures make clear) there was a lot of dust in the air. After picking up our backcountry permit, we drove down the Flint Trail, an amazing piece of road that goes through one of the only breakdowns found anywhere in the Orange Cliffs. Happily, the Flint Trail was not only bone-dry, but also it had been bladed recently and was in excellent condition.

Once below the Orange Cliffs, we quickly got to the top of the Golden Stairs, a trail that drops through the next set of cliffs, giving access to the top level of the Maze District. The alternative to hiking this trail is a long, extremely rough drive around Teapot Rock, which neither John nor Bill was eager to do. At the bottom of the Golden Stairs we walked on the road for maybe a mile and then dropped into Range Canyon, which has two springs that were used by ranchers and improved by the CCC. We skipped the first one and camped near the second, getting there around dusk. Around midnight the wind started to howl, blasting the tents with sand and pebbles and trying to push them over.

Second Day

Despite the rough night, the winds were down in the morning and (atypically for the desert in March) it was quite warm. After breakfast we walked over to the actual fins, which are massive and amazing. We had lunch in an alcove at the top of the canyon containing the fins and then poked into some side drainages on the way back. Back at the tents the winds had picked up again and everything, including inside the tents, was covered by a nice layer of grit. I ate a lot of sand on this trip.

Before dinner John and I climbed a nasty little 4th-class ramp to the level of slickrock above the canyon bottom near our camp. This gave access to a beautiful expanse of slickrock above the spring; clearly a lot of exploring would be possible there, but we failed to find an easy way out to the canyon rim.

Third Day

After breakfast we packed up to avoid exposing unoccupied tents to another windy day. We hiked up the canyon that is in between the two springs in Ernie’s Country. It contains a nice surprise: a slot canyon. However, before we had finished exploring it, it started to rain and rule #0 of slot canyons is you are not in them when it rains, so we backed out in a hurry. The rain continued on and off all day, eventually accumulating in pools on the slickrock, but it never rained hard enough to be unpleasant. Whitmore arch is up near the rim of this canyon, easily accessible by a slickrock ramp.

Near the head of Range Canyon there’s a great Anasazi granary and a very short slot canyon. After poking around this area for a while the weather started to look worse and we decided that if it snowed much overnight, we’d have trouble getting up the steep sandstone fin that provides access to this canyon. So, we exited and camped up on the rim. During dinner the drizzle turned into sleet so we turned in early and I read for a while before bed. Although my sleeping bag is rated to 15 degrees, it is not that warm so I slept in most of my layers. Fortunately — unlike my last two backpacking trips — I didn’t spend a miserable, sleepless, shivering-cold night.

Fourth Day

The weather system cleared out during the night and the morning was cold and spectacular — our first clear weather of the trip. The cool morning was welcome: no need to carry much water up the Golden Stairs. A couple of steep spots on the Flint Trail held snow when we got there, but not enough to cause trouble. After getting to the top we took a short detour on the road to the Big Ridge up to the point where there’s a campsite at a narrow neck between two massive canyon systems: Happy on one side and Cataract on the other. From there we turned around and got back to SLC by 8pm.

As a backpacking destination, Ernie’s Country is great. Pros:

  • Diverse and excellent scenery, even by Southern Utah standards
  • Springs make water a non-issue
  • Easy walking, heavily cairned routes, and constrained area would make it hard to get seriously lost

Cons:

  • Getting a broken vehicle towed from anywhere under the ledge would probably cost thousands of dollars
  • Some red tape associated with being inside a national park
  • A fairly crowded area relative to its remoteness

Software Bugs and Scientific Progress

When a bug is found in a piece of software, the root cause is often a bug in someone’s thoughts. One way to better understand a bug is to look at how deep the underlying thought error was. In other words: How many assumptions must be revisited as a result of the bug?

Level 1 — Syntax error, such as using the wrong operator or putting a function’s arguments in the wrong order. These are totally unsurprising.

Level 2 — Implementation error, such as wrongly assuming a pointer is non-null. These require developers to revisit whatever implementation-level assumption was made that lead to an invariant violation. Implementation errors can also occur at the level of algorithm design; textbooks can contain level 2 errors. Incorrect or sloppy theorems may have to be revised in response to level 2 errors.

Level 3 — Design error: the system is structured in such a way that it cannot meet its specification. A web server that runs untrusted code in its address space contains a level 3 bug.

Level 4 — Specification error: the software is solving the wrong problem. Assumptions about the purpose of the system must be revisited.

Level 5 — Framework error: the intellectual scaffolding upon which the software was based contains an error or omission. My favorite example comes from the early days of the development of optimizing compilers when people first confronted difficult questions about whether a particular optimization was legal or not. An entire sub-area of computer science had to be created to answer these questions. Something analogous is happening right now where we lack a framework for creating reliable and secure large-scale, software-intensive systems.

Level 6 — Universe error: a previously-unknown physical law prevents the software from working. If good computers had existed in the 19th century, the first person who tried to implement Maxwell’s demon and the first person who tried to implement a universal discriminator for halting vs. non-halting programs would have encountered errors at level 6. Future level 6 errors may result from efforts to build artificial intelligences.

Something interesting about levels 5 and 6 is that they look more like progress than like errors. This is no coincidence: scientific progress occurs when bugs in our collective thinking are fixed. Thus, we could easily construct a scale for scientific advances that mirrors the scale for software bugs. Einstein and Turing operated at level 6 at least a few times in their lives; many other great scientists work at level 5. A typical MS thesis is at level 2 and a dissertation should be no lower than level 3.

Proposal for a CompCert Superoptimizer

CompCert is a C compiler that is provably correct. It is best characterized as lightly optimizing: it performs a number of standard optimizations but its code improvements are not aggressive when compared to those performed by GCC, Clang, or any number of commercial tools. This piece is about what I believe would be a relatively interesting and low-effort way to make CompCert more aggressive.

Consider these functions:

int foo (int x, int y) {
  return (x==y) || (x>y);
}
int bar (int x) {
  return (x>3) && (x>10);
}

Clearly foo() can be simplified to return “x≥y” and bar() can be simplified to return “x>10.” A random version of GCC emits:

_foo:
        xorl    %eax, %eax
        cmpl    %esi, %edi
        setge   %al
        ret

_bar:
        xorl    %eax, %eax
        cmpl    $10, %edi
        setg    %al
        ret

Any aggressive compiler will produce similar output. In contrast, CompCert 1.8.1 does not perform these optimizations. Its output is bulky enough that I won’t give it here. Collectively, the class of missing optimizations that I’m talking about is called peephole optimizations: transformations that operate on a very limited scope and generally remove a redundancy. Opportunities for peephole optimizations can be found even in cleanly written code since inlined functions and results of macro expansions commonly contain redundancies.

An aggressive compiler contains literally hundreds of peephole optimizations, and reproducing most of them in CompCert would be time consuming, not to mention unspeakably boring. Fortunately, there’s a better way: most of these optimizations can be automatically derived. The basic idea is from Henry Massalin who developed a superoptimizer in 1987; it was significantly improved about 20 years later by Bansal and Aiken. This piece is about how to create a superoptimizer that proves its transformations are sound.

The idea is simple: at a suitable intermediate compilation stage — preferably after the regular optimization passes and before any kind of backend transformation — find all subgraphs of the program dependency graph up to a configurable size. For each subgraph G, enumerate all possible graphs of the CompCert IR up to some (probably smaller) configurable size. For each such graph H, if G and H are equivalent and if H is smaller than G, then replace G with H. Subgraph equivalence can be checked by encoding the problem as an SMT instance and sending the query to some existing solver. The proof of equivalence needed to make CompCert work comes “for free” because there exist SMT solvers that emit proof witnesses. (SMTCoq is an example of a tool that generates Coq proofs from SMT output.) Repeat until a fixpoint is reached — the program being compiled contains no subgraphs that can be replaced.

As an example, the IR for foo() above would contain this code:

t1 = x==y;
t2 = x>y;
t3 = t1 || t2;

When attempting to optimize this subgraph, the superoptimizer would eventually test for equivalence with:

t3 = x>=y;

Since t1 and t2 are not subsequently used, a match would be found and the peephole optimization would fire, resulting in smaller and faster code.

Of course, the superoptimizer that I have sketched is extremely slow. The Bansal and Aiken paper shows how to make the technique fast enough to be practical. All of their tricks should apply here. Very briefly, the speedups include:

  1. Testing many harvested sequences at once
  2. Reducing the search space using aggressive canonicalization
  3. Avoiding most SMT calls by first running some simple equivalence tests
  4. Remembering successful transformations in a database that supports fast lookup

The Bansal and Aiken superoptimizer operated on sequences of x86 instructions. Although this had the advantage of permitting the tool to take advantage of x86-specific tricks, it also had a couple of serious disadvantages. First, a short linear sequence of x86 instructions harvested from an executable does not necessarily encode an interesting unit of computation. In contrast, if we harvest subgraphs from the PDG, we are guaranteed to get code that is semantically connected. Second, the Stanford superoptimizer has no ability to see “volatile” memory references that must not be touched — it will consequently break codes that use multiple threads, UNIX signals, hardware interrupts, or hardware device registers.

The technique outlined in this piece is what I call a weak superoptimizer: it finds short equivalent code sequences using brute force enumeration. A strong superoptimizer, on the other hand, would pose the following queries for each harvested subgraph G:

  1. Does there exist a subgraph of size 0 that is equivalent to G? If not…
  2. Does there exist a subgraph of size 1 that is equivalent to G? If not…
  3. Etc.

Clearly this leans much more heavily on the solver. It is the approach used in the Denali superoptimizer. Unfortunately, no convincing results about the workability of that approach were ever published (as far as I know), whereas the weak approach appears to be eminently practical.

In summary, this post contains what I think are two relatively exploitable ideas. First, a peephole superoptimizer should be based on subgraphs of the PDG rather than linear sequences of instructions. Second, proof-producing SMT should provide a relatively easy path to verified peephole superoptimization. If successful, the result should be a significant improvement in the quality of CompCert’s generated code.

Update from 3/29: A random thing I forgot to include in this post is that it would be easy and useful to teach this kind of superoptimizer to take advantage of (nearly) arbitrary dataflow facts. For example, “x /= 16” (where x is signed) is not equivalent to x >>= 4. However, this equivalence does hold if x can be proved to be non-negative using a standard interval analysis. Encoding a fact like “x ≥ 0” in the input to the SMT solver is trivial. The nice thing about dataflow facts is that they give the superoptimizer non-local information.

I should also add that when I said “if H is smaller than G, then replace G with H” of course I really mean “cheaper” rather than “smaller” where the cost of a subgraph is determined using some heuristic or machine model.

Even after adding a nice IR-level superoptimizer to CompCert, a backend superoptimizer just like Bansal and Aiken’s is still desirable: it can find target-dependent tricks and it can also get rid of low-level grunge such as the unnecessary register-register moves that every compiler’s output seems to contain. The main fix to their work that is needed is a way to prevent it from removing or reordering volatile memory references; this should be straightforward.

Peer Review Poker

Peer review is a bureaucratic, consensus-based approach to making decisions. Thus, it is not inherently entertaining and authors like myself need to amuse ourselves as best we can. One of the games I like to play is peer review poker, where certain combinations of review scores are more desirable than others.

Straight: Review scores form a continuous sequence. A straight, particularly if it is anchored at the high end of the spectrum, is extremely desirable because provoking a diversity of reactions means that there is something interesting about the work. The best kind of straight — which I have accomplished only one time, that I can remember — is when each possible score is received exactly once.

Flush: All reviews are the same score. This is not particularly desirable as it indicates a predictable, uncontroversial paper. Of course, if all scores are “strong accept” then predictability doesn’t seem so bad — but this hardly ever happens. The other extreme, all “strong reject” scores, is certainly achievable.

Obviously, two of a kind, etc. are very common peer review outcomes and are not particularly interesting. By taking reviewer expertise into account it should be possible to define a royal flush but probably that’s pushing the analogy too far.

Finding and Understanding Bugs in C Compilers

Today we finished preparing the camera-ready version of our paper that will appear in PLDI 2011. I’m pretty happy with it. Here’s the abstract:

Compilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to compiler developers. Every compiler we tested was found to crash and also to silently generate wrong code when presented with valid input. In this paper we present our compiler-testing tool and the results of our bug-hunting study. Our first contribution is to advance the state of the art in compiler testing. Unlike previous tools, Csmith generates programs that cover a large subset of C while avoiding the undefined and unspecified behaviors that would destroy its ability to automatically find wrong-code bugs. Our second contribution is a collection of qualitative and quantitative results about the bugs we have found in open-source C compilers.

Actually, as of today the total is 344 bugs reported but at some point we had to freeze the statistics in the paper.

Race Condition vs. Data Race

A race condition is a flaw that occurs when the timing or ordering of events affects a program’s correctness. Generally speaking, some kind of external timing or ordering non-determinism is needed to produce a race condition; typical examples are context switches, OS signals, memory operations on a multiprocessor, and hardware interrupts.

A data race happens when there are two memory accesses in a program where both:

  • target the same location
  • are performed concurrently by two threads
  • are not reads
  • are not synchronization operations

There are other definitions but this one is fine; it’s from Sebastian Burckhardt at Microsoft Research. Two aspects of the definition require a bit of care. “Concurrently” means there wasn’t anything like a lock that forced one operation to happen before or after the other. “Are not synchronization operations” refers to the fact that a program probably contains special memory operations, such as those used to implement locks, that are not themselves synchronized. By definition, these do not trigger data races.

In practice there is considerable overlap: many race conditions are due to data races, and many data races lead to race conditions. On the other hand, we can have race conditions without data races and data races without race conditions. Let’s start with a simple function for moving money between two bank accounts:

transfer1 (amount, account_from, account_to) {
  if (account_from.balance < amount) return NOPE;
  account_to.balance += amount;
  account_from.balance -= amount;
  return YEP;
}

Of course this is not how banks really move money, but the example is useful anyway because we understand intuitively that account balances should be non-negative and that a transfer must not create or lose money. When called from multiple threads without external synchronization, this function admits both data races (multiple threads can concurrently try to update an account balance) and race conditions (in a parallel context it will create or lose money). We can try to fix it like this:

transfer2 (amount, account_from, account_to) {
  atomic {
    bal = account_from.balance;
  }
  if (bal < amount) return NOPE;
  atomic {
    account_to.balance += amount;
  }
  atomic {
    account_from.balance -= amount;
  }
  return YEP;
}

Here “atomic” is implemented by the language runtime, perhaps simply by acquiring a thread mutex at the start of the atomic block and releasing it at the end, perhaps using some sort of transaction, or perhaps by disabling interrupts — for purposes of the example it doesn’t matter as long as the code inside the block executes atomically.

transfer2 has no data races when called by multiple threads, but obviously it is an extremely silly function containing race conditions that will cause it to create or lose money almost as badly as the unsynchronized function. From a technical point of view, the problem with transfer2 is that it permits other threads to see memory states where a key invariant — conservation of money — is broken.

To preserve the invariant, we have to use a better locking strategy. As long as atomic’s semanatics are to end the atomic section on any exit from the block, the solution can be blunt:

transfer3 (amount, account_from, account_to) {
  atomic {
    if (account_from.balance < amount) return NOPE;
    account_to.balance += amount;
    account_from.balance -= amount;
    return YEP;
  }
}

This function is free of data races and also of race conditions. Can we change it a bit to make an example that has data races but no race conditions? That is simple:

transfer4 (amount, account_from, account_to) {
  account_from.activity = true;
  account_to.activity = true;
  atomic {
    if (account_from.balance < amount) return NOPE;
    account_to.balance += amount;
    account_from.balance -= amount;
    return YEP;
  }
}

Here we are setting flags indicating that some sort of activity occurred on the accounts. Are the data races on these flags harmful? Perhaps not. For example, in the evening we might shut down all transaction-processing threads and then select 10 random accounts that are flagged as having had activity for manual auditing. For this purpose, the data races are entirely harmless.

We’ve ended up covering all possibilities:

data race no data race
race condition transfer1 transfer2
no race condition transfer4 transfer3

The point of this exercise was to look at transfer2 and transfer4, which illustrate that freedom from data races is a very weak property that is neither necessary nor sufficient for establishing the concurrency correctness of a computer program. Why does anyone care about data races at all? As far as I can tell, there are two reasons. First, a data-race-free program can, with some care, be shown to be independent of the whims of whatever weak memory model it sits on. This lets us breathe easier because reasoning about racy programs running on weak memory models is near impossible for humans. Second, data races are easy to find automatically — notice from the definition that it simply requires memory accesses to be monitored; no knowledge of application semantics is required. Race conditions, on the other hand, are intimately tied to application-level invariants, making them far thornier to reason about (quick — what are all the invariants in Firefox? In Linux? In MySQL?).

Everything in this post is pretty obvious, but I’ve observed real confusion about the distinction between data race and race condition by people who should know better (for example because they are doing research on concurrency correctness). Muddying the waters further, even when people are perfectly clear on the underlying concepts, they sometimes say “race condition” when the really mean “data race.” Certainly I’ve caught myself doing this.

Guidelines for Teaching Assistants

I’ve been teaching university-level courses for the last nine years, usually with the support of teaching assistants (TAs): students who get paid to do things like grading, office hours, fielding email questions, making and debugging assignments, proctoring exams, and perhaps even giving a lecture when I’m sick or traveling.

At the start of each semester I sit down with my new TAs and go over some guidelines. Unfortunately, this is a chaotic time and I’m pretty sure I’ve never managed to give the same advice twice. This piece is an attempt to put all of my guidelines in one place. I’m certainly leaving things out — please fill in details in comments. This is all somewhat specific to computer science classes with programming assignments.

The Golden Rule of the CS TA: Never Type at a Student’s Keyboard

When a student doesn’t get it, the temptation to just sit down and take over can be overwhelming. But you have to resist because as long as the student is driving, the student is thinking. When you take over, their brain switches into passive mode.

Progressive Hints

When a student is struggling on an assignment, first give oblique hints: “Did you read Chpater 7?” “Did you search for that on Wikipedia?” If this fails to help, be more concrete: “Are you checking all return codes?” “Did you run it under Valgrind?” Finally, and only when you are convinced the student has made a genuine effort to solve the problem, give a more direct answer, but be sure that it only helps the student past the current roadblock: “Your foo() function has to return 0 on success, not 1.” “You’re dereferencing a NULL pointer.” With time, you will learn which students can run with a quick hint and which ones probably need stronger advice.

My model for progressive hints is based on the Invisiclues packets for old Infocom games — so awesome.

Amend, But Communicate

Often, when answering a student’s question, it becomes clear that some part of the assignment or lab was underspecified and it should have been more clear. In cases like this, your answer to the student is effectively a supplemental part of the assignment. This is fine, but when this happens it is necessary to let the instructor and/or the rest of the class know the additional information.

When Things Aren’t Working, Tell Me

Sometimes there are infrastructure problems: a bunch of machines in the lab are down, or some critical piece of software doesn’t work correctly. Other times, there is a systematic problem in an assignment or lab: a crucial ambiguity or error. When you notice anything like this happening, let me know — I can’t fix problems that I’m unaware of.

Treat Students With Respect

I’ve only seen one problem out of probably 30 TAs, and it wasn’t that serious. But still this is worth saying.

Don’t Play the 10,000 Questions Game

In a big class there are always one or two students who, instead of reading the assignment, just email the instructor or TAs. If you answer, they immediately have another question. This goes on forever — I do not exaggerate, they apparently never eat or sleep. The solution is to recognize these cycles and delay the next response for 12 hours or so. Sooner or later the student realizes that it’ll be more efficient to just read the assignment.

Do the Assignment First

This sucks, but you have to do each assignment, or at least the ones for which you are the principal TA, before it is handed out. Otherwise, it is effectively impossible to help students get past roadblocks they encounter. This mainly applies to courses like operating systems where the assignments may be really hard. For freshmen and sophomore-level programming courses, any conceivable problem can by eyeballed in a few seconds.

Don’t Miss Office Hours

At least give advance warning if it’ll be impossible to show up so students don’t waste time coming to the lab.

Bounce Problem Cases Up To The Instructor

You’re not being paid enough to deal with difficult students. If anyone is pushy about a grade, rude about any issue, or anything else, simply refer the student to me. This almost never happens.

Don’t Screw Up the File

There is one grade spreadsheet, and it has a specific file name. Do not put grades in any other file, do not create duplicate files, do not put unlabeled columns in the spreadsheet. Label each new column carefully. Every time you finish working with the file, make sure the permissions are such that the other TAs and I can read and write it, but nobody else can.

Anonymize Grades Carefully

When posting grades, delete names, student ID numbers, and email addresses before exporting to HTML.

Never Deal With Cheating

This is my job. If you suspect cheating, talk to me; but never bring it up with a student.

Be Timely

Unless I say otherwise, labs should be graded, and grades entered, within a week of being handed in.

Do Not Flake Out

TAs are themselves students and are subject to getting overwhelmed like anyone else. If this seems to be happening, come talk to me. If you just disappear in the middle of the semester and stop doing your job, I will seriously make sure you never work in this town again.

The Simplest Queue?

My student Jianjun is proving things about ARM executables that handle interrupts. It’s very difficult work, so when I asked him to write up a “simple” case study where an interrupt and the main context communicate through a ring buffer, I thought it would be helpful if I handed him the simplest possible queue that is at all realistic. Since queues are already pretty simple, there seemed to be only two tricks worth playing: force queue size to be a power of two, and waste one slot to simplify full-queue detection. Here’s my code:

#include "q.h"

#if ((QSIZE)<2)||(!((((QSIZE)|((QSIZE)-1))+1)/2==(QSIZE)))
#error QSIZE must be >1 and also a power of 2
#endif

#define MASK ((QSIZE)-1)

static int q[QSIZE];
static int head, tail;

static int
inc (int x)
{
  return (x + 1) & MASK;
}

int
full (void)
{
  return inc (head) == tail;
}

int
mt (void)
{
  return head == tail;
}

int
enq (int item)
{
  if (full ())
    return 0;
  q[head] = item;
  head = inc (head);
  return 1;
}

int
deq (int *loc)
{
  if (mt ())
    return 0;
  *loc = q[tail];
  tail = inc (tail);
  return 1;
}

int
qents (void)
{
  int s = head - tail;
  if (s < 0)
    s += (QSIZE);
  return s;
}

The header file simply defines QSIZE and gives prototypes for the functions. Did I miss any good tricks? The code coming out of compilers I have sitting around isn’t quite as clean as I’d have hoped. There’s some redundancy in the code (calling inc() twice in the enqueue and dequeue functions) but eliminating it should be a simple matter for the compiler.

Negative Correlation Achieved

Recently I reviewed 19 papers that were submitted to CAV 2011. This is the first time I’ve been involved with a pure verification conference, and consequently I greatly enjoyed reading the papers because almost every one contained something new. Each time I submitted a review I looked at the ones that were already submitted for that paper, and kept being surprised at how often I disagreed with the other reviewers. Finally I just computed the correlation between my score and the average of the other reviewers’ scores, and the result was an astonishingly low -0.07.

In contrast, when I review papers at a “systems” venue (doesn’t matter if it’s operating systems, embedded systems, or something else) it’s not at all uncommon for me to give exactly the same score as all, or almost all, of the other reviewers. My guess is that at the last five systems conferences I was involved with, the correlation between my score and the average of the other reviewers’ scores was higher than 0.8.

I’m not sure there’s a take-away message here other than “communities have very different evaluation standards for papers.” However, this does shed a bit of light on why it can be quite difficult to switch areas. Closely related: Doug Comer’s excellent piece on how to insult a computer scientist.

The Little C Function From Hell

The other day a student and I were trying to understand a subtle part of the C standard. Often, the easiest way to clarify this kind of issue is to recognize that compiler writers have already grappled with it — so just write some code and see what various compilers do with it. I wrote this function:

int foo (char x) {
  char y = x;
  return ++x > y;
}

Since the expression ++x evaluates to the incremented value of x, it is clear that this function should return “1” for most values of x. The question is: What does it compute for CHAR_MAX?  One possibility is that the function reliably returns “0” for that input, the other possibility is that ++x is undefined on platforms where char is a signed type. For completeness here’s the test harness that prints foo()’s output for all possible inputs:

int main (void) {
  int i;
  for (i=CHAR_MIN; i<=CHAR_MAX; i++) {
    printf ("%d ", foo(i));
    if ((i&31)==31) printf ("\n");
  }
  return 0;
}

This code only works if char is narrower than int, but I’ve never seen a platform where that is not the case.

The first sign that something odd was going on appeared when I compiled the code using Clang:

regehr@home:~$ clang -O foo.c -o foo
regehr@home:~$ ./foo
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Not cool — the function is supposed to mostly return true. But then I realized that my default Clang was out of date (2.7) so I tried a very recent Clang snapshot (rev 126534):

regehr@home:~$ clang -O0 overflow.c -o overflow
regehr@home:~$ ./overflow
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
regehr@home:~$ clang -O1 overflow.c -o overflow
regehr@home:~$ ./overflow
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

The result (look at the last character) changed when we changed the optimization level — this is OK if incrementing CHAR_MAX is undefined, and is a compiler bug otherwise.

The Intel C compiler (12.0.2 for x86-64) has similar behavior:

[regehr@bethe ~]$ icc -O0 foo.c -o foo
[regehr@bethe ~]$ ./foo
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
[regehr@bethe ~]$ icc -O foo.c -o foo
[regehr@bethe ~]$ ./foo
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

A very recent GCC (rev 170512 for x86) gives a consistent output:

regehr@home:~$ current-gcc -O0 foo.c -o foo
regehr@home:~$ ./foo
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
regehr@home:~$ current-gcc -O2 foo.c -o foo
regehr@home:~$ ./foo
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0

However, the story changes if we ask it not to perform function inlining:

regehr@home:~$ current-gcc -O2 -fno-inline foo.c -o foo
regehr@home:~$ ./foo
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

So far, Clang 2.7 appears to be wrong, but all other observed behavior is consistent with ++x being undefined when x is CHAR_MAX. Then I tried CompCert and things took a turn for the worse:

regehr@home:~$ ccomp foo.c -o foo
regehr@home:~$ ./foo
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

This is very odd because CompCert contains a verified version of C’s tricky implicit casts — the exact thing that foo() was designed to test in the first place.

To make a long story short: when the “char” type is narrower than the “int” type and when x has type “signed char” and value CHAR_MAX, ++x is well-defined by both ANSI C and C99. We know that it is well-defined because:

  1. The standard says: “The expression ++E is equivalent to (E+=1).”
  2. The standard says: “A compound assignment of the form E1 op= E2 differs from the simple assignment
    expression E1 = E1 op (E2) only in that the lvalue E1 is evaluated only once.”
  3. The RHS of the simple assignment expression “E1 op E2” is subject to C’s “usual arithmetic conversions.”
  4. The usual arithmetic conversions ensure that two operands to a “+” operator of type signed char are both promoted to signed int before the addition is performed.
  5. When int is wider than char, there is no possibility of the resulting addition overflowing. Thus, the behavior is well-defined and every correct compiler must emit this output:

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0

In other words, to evaluate ++x > y when both variables have type signed char and value CHAR_MAX:

  1. Signed char 127 is promoted to signed int 127.
  2. Signed int 127 is incremented to signed int 128.
  3. Signed int 128 is cast to signed char -128, which is the new value for x and also the value of the subexpression ++x. (Update: As Mans points out in comment #5, the result of this type cast is implementation defined. All common C implementations define the behavior to be truncation of a 2’s complement integer.)
  4. Signed char -128 is promoted to signed int -128.
  5. Signed int -128 > signed int 127 is evaluated to 0.

But what about the fact that none of the four compilers I commonly use reliably returns the correct result?

  • In GCC the bug is known and has existed for some time.
  • In LLVM/Clang the bug was not known but was fixed in less than 24 hours.
  • The Intel compiler is wrong too.
  • In CompCert there is no bug. However, there is an unfortunate interaction between its definition of “char” to be unsigned and the signed values for CHAR_MIN and CHAR_MAX found in my Linux machine’s limits.h file. Verifying a compiler is an extremely difficult problem and verifying its entire environment (header files, libraries, assemblers, I/O routines, etc.) is pretty much an open problem. This is why we test.

That’s a lot of trouble being caused by a two-line function. C may be a small language, but it’s not a simple one.

[Thanks to Chucky Ellison from UIUC and to Xavier Leroy for helping me puzzle though all this.]