One Day of Winter and Three Days of Spring in the Fins

Bill and I had already made two attempts to backpack into the Fins area, which is part of Ernie’s Country in the Maze District of Canyonlands NP. Spring 2010 was abnormally snowy and Fall 2010 featured torrential rains, both times making roads impassable. This time — Spring 2011 — the weather cooperated. Also it was great to have John Veranth along; he has awesome hand-annotated maps of the area and a second truck provided some margin against mechanical problems or stuck vehicles.

First Day

We left SLC around 6:15 AM and had 3.5 hours of driving before leaving pavement about 20 miles north of Hanksville. Incredibly, at this point we still had 3.5 hours of driving left. Big lobes of sand had blown across the road in places and by the time we got to the ranger station at Hans Flat, the wind was really going and (as the pictures make clear) there was a lot of dust in the air. After picking up our backcountry permit, we drove down the Flint Trail, an amazing piece of road that goes through one of the only breakdowns found anywhere in the Orange Cliffs. Happily, the Flint Trail was not only bone-dry, but also it had been bladed recently and was in excellent condition.

Once below the Orange Cliffs, we quickly got to the top of the Golden Stairs, a trail that drops through the next set of cliffs, giving access to the top level of the Maze District. The alternative to hiking this trail is a long, extremely rough drive around Teapot Rock, which neither John nor Bill was eager to do. At the bottom of the Golden Stairs we walked on the road for maybe a mile and then dropped into Range Canyon, which has two springs that were used by ranchers and improved by the CCC. We skipped the first one and camped near the second, getting there around dusk. Around midnight the wind started to howl, blasting the tents with sand and pebbles and trying to push them over.

[nggallery id=32]

Second Day

Despite the rough night, the winds were down in the morning and (atypically for the desert in March) it was quite warm. After breakfast we walked over to the actual fins, which are massive and amazing. We had lunch in an alcove at the top of the canyon containing the fins and then poked into some side drainages on the way back. Back at the tents the winds had picked up again and everything, including inside the tents, was covered by a nice layer of grit. I ate a lot of sand on this trip.

Before dinner John and I climbed a nasty little 4th-class ramp to the level of slickrock above the canyon bottom near our camp. This gave access to a beautiful expanse of slickrock above the spring; clearly a lot of exploring would be possible there, but we failed to find an easy way out to the canyon rim.

[nggallery id=33]

Third Day

After breakfast we packed up to avoid exposing unoccupied tents to another windy day. We hiked up the canyon that is in between the two springs in Ernie’s Country. It contains a nice surprise: a slot canyon. However, before we had finished exploring it, it started to rain and rule #0 of slot canyons is you are not in them when it rains, so we backed out in a hurry. The rain continued on and off all day, eventually accumulating in pools on the slickrock, but it never rained hard enough to be unpleasant. Whitmore arch is up near the rim of this canyon, easily accessible by a slickrock ramp.

Near the head of Range Canyon there’s a great Anasazi granary and a very short slot canyon. After poking around this area for a while the weather started to look worse and we decided that if it snowed much overnight, we’d have trouble getting up the steep sandstone fin that provides access to this canyon. So, we exited and camped up on the rim. During dinner the drizzle turned into sleet so we turned in early and I read for a while before bed. Although my sleeping bag is rated to 15 degrees, it is not that warm so I slept in most of my layers. Fortunately — unlike my last two backpacking trips — I didn’t spend a miserable, sleepless, shivering-cold night.

[nggallery id=34]

Fourth Day

The weather system cleared out during the night and the morning was cold and spectacular — our first clear weather of the trip. The cool morning was welcome: no need to carry much water up the Golden Stairs. A couple of steep spots on the Flint Trail held snow when we got there, but not enough to cause trouble. After getting to the top we took a short detour on the road to the Big Ridge up to the point where there’s a campsite at a narrow neck between two massive canyon systems: Happy on one side and Cataract on the other. From there we turned around and got back to SLC by 8pm.

[nggallery id=35]

As a backpacking destination, Ernie’s Country is great. Pros:

  • Diverse and excellent scenery, even by Southern Utah standards
  • Springs make water a non-issue
  • Easy walking, heavily cairned routes, and constrained area would make it hard to get seriously lost


  • Getting a broken vehicle towed from anywhere under the ledge would probably cost thousands of dollars
  • Some red tape associated with being inside a national park
  • A fairly crowded area relative to its remoteness

Software Bugs and Scientific Progress

When a bug is found in a piece of software, the root cause is often a bug in someone’s thoughts. One way to better understand a bug is to look at how deep the underlying thought error was. In other words: How many assumptions must be revisited as a result of the bug?

Level 1 — Syntax error, such as using the wrong operator or putting a function’s arguments in the wrong order. These are totally unsurprising.

Level 2 — Implementation error, such as wrongly assuming a pointer is non-null. These require developers to revisit whatever implementation-level assumption was made that lead to an invariant violation. Implementation errors can also occur at the level of algorithm design; textbooks can contain level 2 errors. Incorrect or sloppy theorems may have to be revised in response to level 2 errors.

Level 3 — Design error: the system is structured in such a way that it cannot meet its specification. A web server that runs untrusted code in its address space contains a level 3 bug.

Level 4 — Specification error: the software is solving the wrong problem. Assumptions about the purpose of the system must be revisited.

Level 5 — Framework error: the intellectual scaffolding upon which the software was based contains an error or omission. My favorite example comes from the early days of the development of optimizing compilers when people first confronted difficult questions about whether a particular optimization was legal or not. An entire sub-area of computer science had to be created to answer these questions. Something analogous is happening right now where we lack a framework for creating reliable and secure large-scale, software-intensive systems.

Level 6 — Universe error: a previously-unknown physical law prevents the software from working. If good computers had existed in the 19th century, the first person who tried to implement Maxwell’s demon and the first person who tried to implement a universal discriminator for halting vs. non-halting programs would have encountered errors at level 6. Future level 6 errors may result from efforts to build artificial intelligences.

Something interesting about levels 5 and 6 is that they look more like progress than like errors. This is no coincidence: scientific progress occurs when bugs in our collective thinking are fixed. Thus, we could easily construct a scale for scientific advances that mirrors the scale for software bugs. Einstein and Turing operated at level 6 at least a few times in their lives; many other great scientists work at level 5. A typical MS thesis is at level 2 and a dissertation should be no lower than level 3.

Proposal for a CompCert Superoptimizer

CompCert is a C compiler that is provably correct. It is best characterized as lightly optimizing: it performs a number of standard optimizations but its code improvements are not aggressive when compared to those performed by GCC, Clang, or any number of commercial tools. This piece is about what I believe would be a relatively interesting and low-effort way to make CompCert more aggressive.

Consider these functions:

int foo (int x, int y) {
  return (x==y) || (x>y);
int bar (int x) {
  return (x>3) && (x>10);

Clearly foo() can be simplified to return “x≥y” and bar() can be simplified to return “x>10.” A random version of GCC emits:

        xorl    %eax, %eax
        cmpl    %esi, %edi
        setge   %al

        xorl    %eax, %eax
        cmpl    $10, %edi
        setg    %al

Any aggressive compiler will produce similar output. In contrast, CompCert 1.8.1 does not perform these optimizations. Its output is bulky enough that I won’t give it here. Collectively, the class of missing optimizations that I’m talking about is called peephole optimizations: transformations that operate on a very limited scope and generally remove a redundancy. Opportunities for peephole optimizations can be found even in cleanly written code since inlined functions and results of macro expansions commonly contain redundancies.

An aggressive compiler contains literally hundreds of peephole optimizations, and reproducing most of them in CompCert would be time consuming, not to mention unspeakably boring. Fortunately, there’s a better way: most of these optimizations can be automatically derived. The basic idea is from Henry Massalin who developed a superoptimizer in 1987; it was significantly improved about 20 years later by Bansal and Aiken. This piece is about how to create a superoptimizer that proves its transformations are sound.

The idea is simple: at a suitable intermediate compilation stage — preferably after the regular optimization passes and before any kind of backend transformation — find all subgraphs of the program dependency graph up to a configurable size. For each subgraph G, enumerate all possible graphs of the CompCert IR up to some (probably smaller) configurable size. For each such graph H, if G and H are equivalent and if H is smaller than G, then replace G with H. Subgraph equivalence can be checked by encoding the problem as an SMT instance and sending the query to some existing solver. The proof of equivalence needed to make CompCert work comes “for free” because there exist SMT solvers that emit proof witnesses. (SMTCoq is an example of a tool that generates Coq proofs from SMT output.) Repeat until a fixpoint is reached — the program being compiled contains no subgraphs that can be replaced.

As an example, the IR for foo() above would contain this code:

t1 = x==y;
t2 = x>y;
t3 = t1 || t2;

When attempting to optimize this subgraph, the superoptimizer would eventually test for equivalence with:

t3 = x>=y;

Since t1 and t2 are not subsequently used, a match would be found and the peephole optimization would fire, resulting in smaller and faster code.

Of course, the superoptimizer that I have sketched is extremely slow. The Bansal and Aiken paper shows how to make the technique fast enough to be practical. All of their tricks should apply here. Very briefly, the speedups include:

  1. Testing many harvested sequences at once
  2. Reducing the search space using aggressive canonicalization
  3. Avoiding most SMT calls by first running some simple equivalence tests
  4. Remembering successful transformations in a database that supports fast lookup

The Bansal and Aiken superoptimizer operated on sequences of x86 instructions. Although this had the advantage of permitting the tool to take advantage of x86-specific tricks, it also had a couple of serious disadvantages. First, a short linear sequence of x86 instructions harvested from an executable does not necessarily encode an interesting unit of computation. In contrast, if we harvest subgraphs from the PDG, we are guaranteed to get code that is semantically connected. Second, the Stanford superoptimizer has no ability to see “volatile” memory references that must not be touched — it will consequently break codes that use multiple threads, UNIX signals, hardware interrupts, or hardware device registers.

The technique outlined in this piece is what I call a weak superoptimizer: it finds short equivalent code sequences using brute force enumeration. A strong superoptimizer, on the other hand, would pose the following queries for each harvested subgraph G:

  1. Does there exist a subgraph of size 0 that is equivalent to G? If not…
  2. Does there exist a subgraph of size 1 that is equivalent to G? If not…
  3. Etc.

Clearly this leans much more heavily on the solver. It is the approach used in the Denali superoptimizer. Unfortunately, no convincing results about the workability of that approach were ever published (as far as I know), whereas the weak approach appears to be eminently practical.

In summary, this post contains what I think are two relatively exploitable ideas. First, a peephole superoptimizer should be based on subgraphs of the PDG rather than linear sequences of instructions. Second, proof-producing SMT should provide a relatively easy path to verified peephole superoptimization. If successful, the result should be a significant improvement in the quality of CompCert’s generated code.

Update from 3/29: A random thing I forgot to include in this post is that it would be easy and useful to teach this kind of superoptimizer to take advantage of (nearly) arbitrary dataflow facts. For example, “x /= 16” (where x is signed) is not equivalent to x >>= 4. However, this equivalence does hold if x can be proved to be non-negative using a standard interval analysis. Encoding a fact like “x ≥ 0” in the input to the SMT solver is trivial. The nice thing about dataflow facts is that they give the superoptimizer non-local information.

I should also add that when I said “if H is smaller than G, then replace G with H” of course I really mean “cheaper” rather than “smaller” where the cost of a subgraph is determined using some heuristic or machine model.

Even after adding a nice IR-level superoptimizer to CompCert, a backend superoptimizer just like Bansal and Aiken’s is still desirable: it can find target-dependent tricks and it can also get rid of low-level grunge such as the unnecessary register-register moves that every compiler’s output seems to contain. The main fix to their work that is needed is a way to prevent it from removing or reordering volatile memory references; this should be straightforward.

Peer Review Poker

Peer review is a bureaucratic, consensus-based approach to making decisions. Thus, it is not inherently entertaining and authors like myself need to amuse ourselves as best we can. One of the games I like to play is peer review poker, where certain combinations of review scores are more desirable than others.

Straight: Review scores form a continuous sequence. A straight, particularly if it is anchored at the high end of the spectrum, is extremely desirable because provoking a diversity of reactions means that there is something interesting about the work. The best kind of straight — which I have accomplished only one time, that I can remember — is when each possible score is received exactly once.

Flush: All reviews are the same score. This is not particularly desirable as it indicates a predictable, uncontroversial paper. Of course, if all scores are “strong accept” then predictability doesn’t seem so bad — but this hardly ever happens. The other extreme, all “strong reject” scores, is certainly achievable.

Obviously, two of a kind, etc. are very common peer review outcomes and are not particularly interesting. By taking reviewer expertise into account it should be possible to define a royal flush but probably that’s pushing the analogy too far.

Finding and Understanding Bugs in C Compilers

Today we finished preparing the camera-ready version of our paper that will appear in PLDI 2011. I’m pretty happy with it. Here’s the abstract:

Compilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to compiler developers. Every compiler we tested was found to crash and also to silently generate wrong code when presented with valid input. In this paper we present our compiler-testing tool and the results of our bug-hunting study. Our first contribution is to advance the state of the art in compiler testing. Unlike previous tools, Csmith generates programs that cover a large subset of C while avoiding the undefined and unspecified behaviors that would destroy its ability to automatically find wrong-code bugs. Our second contribution is a collection of qualitative and quantitative results about the bugs we have found in open-source C compilers.

Actually, as of today the total is 344 bugs reported but at some point we had to freeze the statistics in the paper.

Race Condition vs. Data Race

A race condition is a flaw that occurs when the timing or ordering of events affects a program’s correctness. Generally speaking, some kind of external timing or ordering non-determinism is needed to produce a race condition; typical examples are context switches, OS signals, memory operations on a multiprocessor, and hardware interrupts.

A data race happens when there are two memory accesses in a program where both:

  • target the same location
  • are performed concurrently by two threads
  • are not reads
  • are not synchronization operations

There are other definitions but this one is fine; it’s from Sebastian Burckhardt at Microsoft Research. Two aspects of the definition require a bit of care. “Concurrently” means there wasn’t anything like a lock that forced one operation to happen before or after the other. “Are not synchronization operations” refers to the fact that a program probably contains special memory operations, such as those used to implement locks, that are not themselves synchronized. By definition, these do not trigger data races.

In practice there is considerable overlap: many race conditions are due to data races, and many data races lead to race conditions. On the other hand, we can have race conditions without data races and data races without race conditions. Let’s start with a simple function for moving money between two bank accounts:

transfer1 (amount, account_from, account_to) {
  if (account_from.balance < amount) return NOPE;
  account_to.balance += amount;
  account_from.balance -= amount;
  return YEP;

Of course this is not how banks really move money, but the example is useful anyway because we understand intuitively that account balances should be non-negative and that a transfer must not create or lose money. When called from multiple threads without external synchronization, this function admits both data races (multiple threads can concurrently try to update an account balance) and race conditions (in a parallel context it will create or lose money). We can try to fix it like this:

transfer2 (amount, account_from, account_to) {
  atomic {
    bal = account_from.balance;
  if (bal < amount) return NOPE;
  atomic {
    account_to.balance += amount;
  atomic {
    account_from.balance -= amount;
  return YEP;

Here “atomic” is implemented by the language runtime, perhaps simply by acquiring a thread mutex at the start of the atomic block and releasing it at the end, perhaps using some sort of transaction, or perhaps by disabling interrupts — for purposes of the example it doesn’t matter as long as the code inside the block executes atomically.

transfer2 has no data races when called by multiple threads, but obviously it is an extremely silly function containing race conditions that will cause it to create or lose money almost as badly as the unsynchronized function. From a technical point of view, the problem with transfer2 is that it permits other threads to see memory states where a key invariant — conservation of money — is broken.

To preserve the invariant, we have to use a better locking strategy. As long as atomic’s semanatics are to end the atomic section on any exit from the block, the solution can be blunt:

transfer3 (amount, account_from, account_to) {
  atomic {
    if (account_from.balance < amount) return NOPE;
    account_to.balance += amount;
    account_from.balance -= amount;
    return YEP;

This function is free of data races and also of race conditions. Can we change it a bit to make an example that has data races but no race conditions? That is simple:

transfer4 (amount, account_from, account_to) {
  account_from.activity = true;
  account_to.activity = true;
  atomic {
    if (account_from.balance < amount) return NOPE;
    account_to.balance += amount;
    account_from.balance -= amount;
    return YEP;

Here we are setting flags indicating that some sort of activity occurred on the accounts. Are the data races on these flags harmful? Perhaps not. For example, in the evening we might shut down all transaction-processing threads and then select 10 random accounts that are flagged as having had activity for manual auditing. For this purpose, the data races are entirely harmless.

We’ve ended up covering all possibilities:

data race no data race
race condition transfer1 transfer2
no race condition transfer4 transfer3

The point of this exercise was to look at transfer2 and transfer4, which illustrate that freedom from data races is a very weak property that is neither necessary nor sufficient for establishing the concurrency correctness of a computer program. Why does anyone care about data races at all? As far as I can tell, there are two reasons. First, a data-race-free program can, with some care, be shown to be independent of the whims of whatever weak memory model it sits on. This lets us breathe easier because reasoning about racy programs running on weak memory models is near impossible for humans. Second, data races are easy to find automatically — notice from the definition that it simply requires memory accesses to be monitored; no knowledge of application semantics is required. Race conditions, on the other hand, are intimately tied to application-level invariants, making them far thornier to reason about (quick — what are all the invariants in Firefox? In Linux? In MySQL?).

Everything in this post is pretty obvious, but I’ve observed real confusion about the distinction between data race and race condition by people who should know better (for example because they are doing research on concurrency correctness). Muddying the waters further, even when people are perfectly clear on the underlying concepts, they sometimes say “race condition” when the really mean “data race.” Certainly I’ve caught myself doing this.

Guidelines for Teaching Assistants

I’ve been teaching university-level courses for the last nine years, usually with the support of teaching assistants (TAs): students who get paid to do things like grading, office hours, fielding email questions, making and debugging assignments, proctoring exams, and perhaps even giving a lecture when I’m sick or traveling.

At the start of each semester I sit down with my new TAs and go over some guidelines. Unfortunately, this is a chaotic time and I’m pretty sure I’ve never managed to give the same advice twice. This piece is an attempt to put all of my guidelines in one place. I’m certainly leaving things out — please fill in details in comments. This is all somewhat specific to computer science classes with programming assignments.

The Golden Rule of the CS TA: Never Type at a Student’s Keyboard

When a student doesn’t get it, the temptation to just sit down and take over can be overwhelming. But you have to resist because as long as the student is driving, the student is thinking. When you take over, their brain switches into passive mode.

Progressive Hints

When a student is struggling on an assignment, first give oblique hints: “Did you read Chpater 7?” “Did you search for that on Wikipedia?” If this fails to help, be more concrete: “Are you checking all return codes?” “Did you run it under Valgrind?” Finally, and only when you are convinced the student has made a genuine effort to solve the problem, give a more direct answer, but be sure that it only helps the student past the current roadblock: “Your foo() function has to return 0 on success, not 1.” “You’re dereferencing a NULL pointer.” With time, you will learn which students can run with a quick hint and which ones probably need stronger advice.

My model for progressive hints is based on the Invisiclues packets for old Infocom games — so awesome.

Amend, But Communicate

Often, when answering a student’s question, it becomes clear that some part of the assignment or lab was underspecified and it should have been more clear. In cases like this, your answer to the student is effectively a supplemental part of the assignment. This is fine, but when this happens it is necessary to let the instructor and/or the rest of the class know the additional information.

When Things Aren’t Working, Tell Me

Sometimes there are infrastructure problems: a bunch of machines in the lab are down, or some critical piece of software doesn’t work correctly. Other times, there is a systematic problem in an assignment or lab: a crucial ambiguity or error. When you notice anything like this happening, let me know — I can’t fix problems that I’m unaware of.

Treat Students With Respect

I’ve only seen one problem out of probably 30 TAs, and it wasn’t that serious. But still this is worth saying.

Don’t Play the 10,000 Questions Game

In a big class there are always one or two students who, instead of reading the assignment, just email the instructor or TAs. If you answer, they immediately have another question. This goes on forever — I do not exaggerate, they apparently never eat or sleep. The solution is to recognize these cycles and delay the next response for 12 hours or so. Sooner or later the student realizes that it’ll be more efficient to just read the assignment.

Do the Assignment First

This sucks, but you have to do each assignment, or at least the ones for which you are the principal TA, before it is handed out. Otherwise, it is effectively impossible to help students get past roadblocks they encounter. This mainly applies to courses like operating systems where the assignments may be really hard. For freshmen and sophomore-level programming courses, any conceivable problem can by eyeballed in a few seconds.

Don’t Miss Office Hours

At least give advance warning if it’ll be impossible to show up so students don’t waste time coming to the lab.

Bounce Problem Cases Up To The Instructor

You’re not being paid enough to deal with difficult students. If anyone is pushy about a grade, rude about any issue, or anything else, simply refer the student to me. This almost never happens.

Don’t Screw Up the File

There is one grade spreadsheet, and it has a specific file name. Do not put grades in any other file, do not create duplicate files, do not put unlabeled columns in the spreadsheet. Label each new column carefully. Every time you finish working with the file, make sure the permissions are such that the other TAs and I can read and write it, but nobody else can.

Anonymize Grades Carefully

When posting grades, delete names, student ID numbers, and email addresses before exporting to HTML.

Never Deal With Cheating

This is my job. If you suspect cheating, talk to me; but never bring it up with a student.

Be Timely

Unless I say otherwise, labs should be graded, and grades entered, within a week of being handed in.

Do Not Flake Out

TAs are themselves students and are subject to getting overwhelmed like anyone else. If this seems to be happening, come talk to me. If you just disappear in the middle of the semester and stop doing your job, I will seriously make sure you never work in this town again.