Running an Electronic Program Committee Meeting

Computer science — at least in the areas that I work in — is conference-driven. Since journals go unread, it’s important that conference program committees make good decisions about which papers to accept. The established way to do this is to hold a program committee (PC) meeting: the entire committee holes up in a highly air conditioned meeting room for a day and argues about every paper that’s not an obvious reject (occasionally we also manage to avoid arguing about papers that are obvious accepts).

The cost of in-person PC meetings is substantial in terms of money and time. Just as a random anecdote, last January I attended the EuroSys PC meeting in Paris; this trip used up enough time and money that I couldn’t easily justify traveling to France again a few months later to go to the conference. This was a bummer since I’d have loved to go (the volcano in Iceland, however, would have stopped me from getting home in a timely fashion).

The alternative to the in-person meeting is to have a remote meeting. This post is not about comparing the merits of in-person vs. remote meetings, but rather about how to run a remote meeting: a topic that is fresh on my mind since I just ran a remote PC meeting for the Sensor Networks track at the Real Time Systems Symposium, where we evaluated 30 submissions. I’ve previously run a couple of in-person meetings so I have some basis for comparison.

The remote PC meeting could be designed to emulate an in-person meeting, for example using Skype or a conference call. I didn’t choose this option for my RTSS track for several reasons. First, it would waste people’s time: each PC member reviewed only 20% of the papers (normally this percentage would be higher, but since I had been warned to expect a large number of submissions, I invited a lot of people to be on the committee). Second, 20 different voices on a conference call is difficult; generally I’d consider five or six to be a reasonable maximum. Third, spending all day on a conference call just sounds unspeakably boring; I doubted that anyone (including me) could keep paying attention to disembodied voices for that long. I think a conference call would work fine for a small committee evaluating a small number of submissions. I’ve never tried a Skype session with anywhere close to 20 people and would appreciate stories or advice from people who have done this.

If the entire PC is not going to get together (actually or virtually) the alternative is for the group of people who reviewed each paper to discuss it and make a decision. One possibility would be to emulate the journal review process: the program chair simply reads all reviews for each paper, and makes the decision. My sense is that this would be unsatisfying for people, who generally enjoy the back and forth of arguing over papers. People like to come to a consensus. This discussion could be done in email, on the phone or Skype, in a text channel like IRC, or on a message board.

Since the conference software RTSS uses supported a decent message board, we used that. The process that I put to the PC members was that the reviewers for a paper should read all the reviews and then someone should propose a decision. From there, the discussion could proceed with others either agreeing or disagreeing. If nobody chimed in about a particular paper, I started asking leading questions: “Joe you didn’t like this paper but your review is marked low-confidence. Do you stand by the ‘reject’ recommendation?” I only had to do this in a few cases.

The only misgiving I had about this process was that it might give too much weight to the first comment about a paper. But, as it turned out, people felt free to argue with the first commenter and I think the decisions reached ended up being generally good. Of course, in-person PC meetings suffer from substantially more severe versions of this kind of unfairness where the loudest, most persuasive people can have an inordinate effect on the decisions that are made. An alternative would have been to ask a specific person (for example the most positive or most negative reviewer for a paper) to propose a course of action, but I didn’t think of this soon enough.

In principle, a conference accepts all acceptable papers and rejects the rest. In practice, however, the number of available presentation slots at the conference has a strong influence on the number of accepted papers. When the PC meeting is in-person, global constraints like this are easy to account for: once we notice that too few papers are being accepted, we can try to start making more generous decisions. When the PC meeting is fragmented into a lot of individual meetings, there is less global information and we risk accepting too few or too many papers. For whatever reasons, this didn’t happen at my track meeting and we accepted just slightly more than the expected 20%-25% of submitted papers.

I’m not claiming that a remote meeting is preferable or saying that I don’t like in-person PC meetings. It’s just that looking forward, I’m probably going to attend at most a couple of in-person meetings per year and turn down any invitations beyond that. I’d expect other rational academics to do the same. If oil prices rise significantly, the in-person meeting will die completely and the entire conference system will be in jeopardy.

In summary, I was happy with the remote PC meeting process. It puts a significantly larger burden on the program chair to help things run smoothly, and it also (potentially) gives the chair much more control over the decisions that are made. The best thing about the remote meeting is that it saved a lot of time and money for PC members, several of whom told me they wouldn’t have agreed to be on the PC if that meant attending an in-person PC meeting. I doubt that the resulting decisions about papers were worse than, or even much different than, the decisions that would have been made in person.

Michael Ernst’s advice on running a PC meeting is great, though I don’t buy his claim that the low bandwidth of the phone or a message board is a limiting factor in arriving at good decisions. A typical in-person PC meeting is actually a fairly low-bandwidth activity with quite a bit of irrelevant discussion and frittering away of time, unless the chair is very much on top of things. Also, by using a message board people can take time to compose well thought-out arguments as opposed to saying the first thing that comes to mind.

Is Attending Meetings a Signaling Behavior?

Humans and other animals spend a lot of time engaging in signaling behaviors: dressing or acting in certain ways in order to send signals to others. Some signals — a peacock’s tail or a Ferrari — are expensive precisely to show that the signaling organism can afford the cost of sending the signal. Signaling can explain aspects of human behavior where it’s not initially obvious that it applies. For example, it has been argued that social drinking serves as a signal of trustworthiness because alcohol permits individuals to give up some control over their actions in a verifiable way.

Lately I’ve been puzzling over my field’s love affair with in-person meetings, including grant proposal review panels, program committee meetings, conferences, PI meetings, site visits, and more. To be sure, face time is useful, but is it really necessary for high-caliber academics to be platinum-level frequent flyers? Does this serve our goals of performing high-impact research and teaching students? (I’m not platinum-level but I know a lot of people who are, and I’m usually some sort of “medallion” level on Delta.) The cost in terms of time and money is substantial, especially for those of us with families.

Historically, academics have communicated primarily through letters and manuscripts, supplemented by occasional travel. Why, then, do we need to travel so much when our telecommunications technology is so good? Lately I’ve been wondering if part of the explanation is that travel, because it is expensive, is a signaling behavior. “Look,” we are saying, “I have enough grant money to attend eight meetings a year, and moreover I have such an army of students working for me that the loss of personal productivity this travel entails is no big deal.”

Are academics status conscious? We are. Inside a community it is perfectly well-known whose work is deep and significant, whose is shallow, who leads the trends and who follows, and even who struck it rich with a startup and who’s in trouble for sleeping with a student. We love to talk about this stuff, and I’d also guess that we’re not above using excessive travel as a status signal.

A Guide to Undefined Behavior in C and C++, Part 2

Also see Part 1 and Part 3.

When tools like the bounds checking GCC, Purify, Valgrind, etc. first showed up, it was interesting to run a random UNIX utility under them. The output of the checker showed that these utility programs, despite working perfectly well, executed a ton of memory safety errors such as use of uninitialized data, accesses beyond the ends of arrays, etc. Just running grep or whatever would cause tens or hundreds of these errors to happen.

What was going on? Basically, incidental properties of the C/UNIX execution environment caused these errors to (often) be benign. For example, blocks returned by malloc() generally contain some padding before and/or after; the padding can soak up out-of-bounds stores, as long as they aren’t too far outside the allocated area. Was it worth eliminating these bugs? Sure. First, an execution environment with different properties, such as a malloc() for an embedded system that happens to provide less padding, could turn benign near-miss array writes into nasty heap corruption bugs. Second, the same benign bugs could probably, under different circumstances, cause a crash or corruption error even in the same execution environment. Developers generally find these kinds of arguments to be compelling and these days most UNIX programs are relatively Valgrind-clean.

Tools for finding integer undefined behaviors are less well-developed than are memory-unsafety checkers. Bad integer behaviors in C and C++ include signed overflow, divide by zero, shift-past-bitwidth, etc. These have become a more serious problem in recent years because:

  • Integer flaws are a source of serious security problems
  • C compilers have become considerably more aggressive in their exploitation of integer undefined behaviors to generate efficient code

Recently my student Peng Li implemented a checking tool for integer undefined behaviors. Using it, we have found that many programs contain these bugs. For example, more than half of the SPECINT2006 benchmarks execute integer undefined behaviors of one kind or another. In many ways the situation for integer bugs today seems similar to the situation for memory bugs around 1995. Just to be clear, integer checking tools do exist, but they do not seem to be in very widespread use and also a number of them operate on binaries, which is too late. You have to look at the source code before the compiler has had a chance to exploit — and thus eliminate — operations with undefined behavior.

The rest of this post explores a few integer undefined behaviors that we found in LLVM: a medium-sized (~800 KLOC) open source C++ code base. Of course I’m not picking on LLVM here: it’s very high-quality code. The idea is that by looking at some problems that were lurking undetected in this well-tested code, we can hopefully learn how to avoid writing these bugs in the future.

As a random note, if we consider the LLVM code to be C++0x rather than C++98, then a large number of additional shift-related undefined behaviors appear. I’ll talk about the new shift restrictions (which are identical to those in C99) in a subsequent post here.

I’ve cleaned up the tool output slightly to improve readability.

Integer Overflow #1

Error message:

UNDEFINED at <BitcodeWriter.cpp, (740:29)> :
Operator: -
Reason: Signed Subtraction Overflow
left (int64): 0
right (int64): -9223372036854775808

Code:

int64_t V = IV->getSExtValue();
if (V >= 0)
  Record.push_back(V << 1);
else
  Record.push_back((-V << 1) | 1);  <<----- bad line

In all modern C/C++ variants running on two’s complement machines, negating an int whose value is INT_MIN (or in this case, INT64_MIN) is undefined behavior. The fix is to add an explicit check for this case.

Do compilers take advantage of this undefined behavior?  They do:

[regehr@gamow ~]$ cat negate.c
int foo (int x) __attribute__ ((noinline));
int foo (int x)
{
  if (x < 0) x = -x;
  return x >= 0;
}

#include <limits.h>
#include <stdio.h>

int main (void)
{
  printf ("%d\n", -INT_MIN);
  printf ("%d\n", foo(INT_MIN));
  return 0;
}
[regehr@gamow ~]$ gcc -O2 negate.c -o negate
negate.c: In function `main':
negate.c:13:19: warning: integer overflow in expression [-Woverflow]
[regehr@gamow ~]$ ./negate
-2147483648
1

In C compiler doublethink, -INT_MIN is both negative and non-negative. If the first true AI is coded in C or C++, I expect it to immediately deduce that freedom is slavery, love is hate, and peace is war.

Integer Overflow #2

Error message:

UNDEFINED at <InitPreprocessor.cpp, (173:39)> :
Operator: -
Reason: Signed Subtraction Overflow
left (int64): -9223372036854775808
right (int64): 1

Code:

MaxVal = (1LL << (TypeWidth – 1)) – 1;

In C/C++ it is illegal to compute the maximum signed integer value like this. There are better ways, such as creating a vector of all 1s and then clearing the high order bit.

Integer Overflow #3

Error message:

UNDEFINED at <TargetData.cpp,  (629:28)> :
Operator: *
Reason: Signed Multiplication  Overflow
left (int64): 142998016075267841
right (int64): 129

Code:

Result += arrayIdx * (int64_t)getTypeAllocSize(Ty);

Here the allocated size is plausible but the array index is way out of bounds for any conceivable array.

Shift Past Bitwidth #1

Error message:

UNDEFINED at <InstCombineCalls.cpp, (105:23)> :
Operator: <<
Reason: Unsigned Left Shift Error: Right operand is negative or is  greater than or equal to the width of the promoted left operand
left (uint32): 1
right (uint32): 63

Code:

unsigned Align = 1u << std::min(BitWidth – 1, TrailZ);

This is just an outright bug: BitWidth is set to 64 but should have been 32.

Shift Past Bitwidth #2

Error message:

UNDEFINED at <Instructions.h, (233:15)> :
Operator: <<
Reason: Signed Left Shift Error: Right operand is negative or is greater  than or equal to the width of the promoted left operand
left (int32): 1
right (int32): 32

Code:

return (1 << (getSubclassDataFromInstruction() >> 1))  >> 1;

When getSubclassDataFromInstruction() returns a value in the range 128-131, the right argument to the left shift evaluates to 32. Shifting (in either direction) by the bitwidth or higher is an error, and so this function requires that getSubclassDataFromInstruction() returns a value not larger than 127.

Summary

It is basically evil to make certain program actions wrong, but to not give developers any way to tell whether or not their code performs these actions and, if so, where. One of C’s design points was “trust the programmer.” This is fine, but there’s trust and then there’s trust. I mean, I trust my 5 year old but I still don’t let him cross a busy street by himself. Creating a large piece of safety-critical or security-critical code in C or C++ is the programming equivalent of crossing an 8-lane freeway blindfolded.

Red Baldy

People following the “outdoors” thread on this blog will have noticed that Bill and I failed to summit on Mount Baker and also on White Baldy this year already. I’m not all about summiting, but this got on my nerves a little. Yesterday I decided to climb Red Baldy, an 11,000′ neighbor to White Baldy. Around six years ago I had failed to climb Red Baldy by its NE ridge, due to some frightening scrambling problems and also I was by myself. Just to make things confusing, Red Baldy is usually accessed from the White Pine drainage, whereas White Baldy is usually climbed from Red Pine.

This time I was a bit worried about timing: I couldn’t start before 9:30 due to dropping off kids, and had to finish before Sarah and I went out to celebrate our anniversary (nine years!). The White Pine road is notoriously long and switchbacky, and at least one trip report on the web indicated an eight-hour round trip for this peak. Luckily, whoever said this was either slow or took a different route: I made it up and down in 5.5 hours, including about an hour on top. The fast way to Red Baldy is to walk the White Pine road until it makes a final switchback towards White Pine Lake a little below 10,000′. From this switchback, ascend tundra and talus to Red Baldy’s north ridge, then follow this ridge to the summit. After leaving the trail this route is class 2 walking, making Red Baldy perhaps the 4th easiest 11,000′ peak in the Wasatch (after Hidden Peak, Sugarloaf, and Baldy).

On this hike temperatures were pleasant at the trailhead in the morning and also up high around mid-day, but it was around 85 at the trailhead when I got back there at 3pm, and then 102 by the time I got home, yikes. A few snowfields were left in upper White Pine; in my soft boots these were useless on the way up, but provided a quick way to descend a few hundred feet.

Why Would Researchers Oppose Open Access?

Last week I started sort of a relaxed flame war with other members of the steering committee for an ACM conference on the subject of open access to the proceedings. “Open access” would mean that anyone could download the proceedings. The current situation is slightly different:

  • Often, individual papers are available on authors’ home pages. ACM permits authors to do this, but not to upload papers to sites like arXiv.
  • To get the actual proceedings for the conference you have to subscribe to the ACM Digital Library which costs $99/year, on top of the ACM membership dues.

Going into this argument I assumed that no researcher would oppose open access, because the benefits are clear and there are no downsides (I’m often naive like this). In contrast, I ran into what felt like hardened opposition.

The purpose of this post isn’t to argue for open access — others have done this perfectly well, and anyway as I said it’s a no-brainer from the researchers’ point of view — but rather to try to understand why researchers might be against open access. Here’s my analysis of some of the arguments I ran into. I may be unfairly caricaturing them but hey, it’s my blog.

Argument 1: “Open access is against ACM policy.”

This forms an argument against the ACM (or at least against its publication policy), not against open access.

Argument 2: “Open access costs money (for network links, server machines, etc.) and we don’t know where it will come from.”

This would be a good argument if the arXiv didn’t exist. But it does. In fact it’s been here for nearly 20 years and isn’t likely to go away soon. Furthermore, since 1998 the ACM has collaborated with arXiv to form the Computing Research Repository. I don’t fully understand the history between the CoRR and the ACM Digital Library but I think it’s safe to say that the ACM’s support of the CoRR is half-hearted at best.

Of course I’m not saying that arXiv is free (its annual budget is around $400K) but rather that, since it already exists, the incremental cost of a new arXiv’ed paper is low. Also, the more people who use it, the more likely it is that creative ways will be found to keep it in existence. Storage and network costs are always dropping; perhaps in the future the arXiv can live “in the cloud” at considerably lower cost. Say, this sounds suspiciously like a computer science research problem (yes I know about LOCKSS).

Argument 3: “Revenues from the Digital Library fund other things, including keeping conference registration fees low.”

Aha! Finally we almost have a good argument. But not really. For one thing, I had always heard that conferences usually make money, not lose it. But more importantly, it’s not right for the ACM to decide once and for all that all of its conferences are content-generators for the monetized DL. Rather, research communities should individually decide whether their portion of the revenues are enough to offset the disadvantages of closed-access. In summary: if the argument is about money (isn’t it always?) we need to see the figures and make informed decisions. I looked for this information online and didn’t find it (if it is available, please send me a link).

Argument 4: “Authors can put papers on their web pages, this is effectively open access.”

This is weak. First, not all authors do this, so the record is incomplete. For example, my sense is that many European academics are less well-versed in the fine art of excessive self-promotion than are their American counterparts. Second, proceedings are fragmented all over the web; not really ideal.  Third, as people retire, move to industry, etc., their home pages go away and the papers stop being accessible.

Summary

So far, my impression is that when researchers are opposed to open access, it isn’t for good reasons.  Another impression I have is that these attitudes are generational: senior researchers are more likely to oppose open access than are junior ones who “grew up” with the web. It seems very likely that retirements over the next 10 years will cause a phase change.  (Of course, it is also possible that as people become more senior, their attitudes change.)

Grandview Peak

[nggallery id=23]

Grandview Peak, at 9410′, is the highest point in Salt Lake City. Even so, it’s a long way from anywhere and no trail goes to its summit. Over the course of four trips to Grandview I’ve yet to see another person within two miles of the top (not counting whoever I’m hiking with, of course).

One of the reasons I enjoy Grandview is that the route has great variety. You get peaceful hiking near an alpine stream, typical low-Wasatch walking through scrub oak, a nice climb in open pine forest, a long ridge-run with plenty of minor obstacles, and finally a serious two-mile brush thrash on exit.

According to Google Earth, my route was right at 10 miles and involved 4400′ of gain/loss. It took about 6.5 hours and 1.5 MPH felt plenty fast given the difficult terrain.  I’d been hoping for pleasant temperatures; valley highs were around 90 and the average adiabatic lapse rate predicts that 5000 feet higher it should be 17 degrees cooler.  Somehow this prediction was total crap and it was both hot and humid; I guess surface heating probably dwarfs adiabatic effects unless the air is moving around a lot, and transpiration defeats Utah’s natural low humidity. Anyway, three liters of water was not enough. My previous times on Grandview were a lot more pleasant, and had been in spring or fall.  Here’s a description of a similar route I took a few years ago.

A Guide to Undefined Behavior in C and C++, Part 1

Also see Part 2 and Part 3.

Programming languages typically make a distinction between normal program actions and erroneous actions. For Turing-complete languages we cannot reliably decide offline whether a program has the potential to execute an error; we have to just run it and see.

In a safe programming language, errors are trapped as they happen. Java, for example, is largely safe via its exception system. In an unsafe programming language, errors are not trapped. Rather, after executing an erroneous operation the program keeps going, but in a silently faulty way that may have observable consequences later on. Luca Cardelli’s article on type systems has a nice clear introduction to these issues. C and C++ are unsafe in a strong sense: executing an erroneous operation causes the entire program to be meaningless, as opposed to just the erroneous operation having an unpredictable result. In these languages erroneous operations are said to have undefined behavior.

The C FAQ defines “undefined behavior” like this:

Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.

This is a good summary. Pretty much every C and C++ programmer understands that accessing a null pointer and dividing by zero are erroneous actions. On the other hand, the full implications of undefined behavior and its interactions with aggressive compilers are not well-appreciated. This post explores these topics.

A Model for Undefined Behavior

For now, we can ignore the existence of compilers. There is only the “C implementation” which — if the implementation conforms to the C standard — acts the same as the “C abstract machine” when executing a conforming program. The C abstract machine is a simple interpreter for C that is described in the C standard. We can use it to determine the meaning of any C program.

The execution of a program consists of simple steps such as adding two numbers or jumping to a label. If every step in the execution of a program has defined behavior, then the entire execution is well-defined. Note that even well-defined executions may not have a unique result due to unspecified and implementation-defined behavior; we’ll ignore both of these here.

If any step in a program’s execution has undefined behavior, then the entire execution is without meaning. This is important: it’s not that evaluating (1<<32) has an unpredictable result, but rather that the entire execution of a program that evaluates this expression is meaningless. Also, it’s not that the execution is meaningful up to the point where undefined behavior happens: the bad effects can actually precede the undefined operation.

As a quick example let’s take this program:

#include <limits.h>
#include <stdio.h>

int main (void)
{
  printf ("%d\n", (INT_MAX+1) < 0);
  return 0;
}

The program is asking the C implementation to answer a simple question: if we add one to the largest representable integer, is the result negative? This is perfectly legal behavior for a C implementation:

$ cc test.c -o test
$ ./test
1

So is this:

$ cc test.c -o test
$ ./test
0

And this:

$ cc test.c -o test
$ ./test
42

And this:

$ cc test.c -o test
$ ./test
Formatting root partition, chomp chomp

One might say: Some of these compilers are behaving improperly because the C standard says a relational operator must return 0 or 1. But since the program has no meaning at all, the implementation can do whatever it likes. Undefined behavior trumps all other behaviors of the C abstract machine.

Will a real compiler emit code to chomp your disk? Of course not, but keep in mind that practically speaking, undefined behavior often does lead to Bad Things because many security vulnerabilities start out as memory or integer operations that have undefined behavior. For example, accessing an out of bounds array element is a key part of the canonical stack smashing attack. In summary: the compiler does not need to emit code to format your disk. Rather, following the OOB array access your computer will begin executing exploit code, and that code is what will format your disk.

No Traveling

It is very common for people to say — or at least think — something like this:

The x86 ADD instruction is used to implement C’s signed add operation, and it has two’s complement behavior when the result overflows. I’m developing for an x86 platform, so I should be able to expect two’s complement semantics when 32-bit signed integers overflow.

THIS IS WRONG. You are saying something like this:

Somebody once told me that in basketball you can’t hold the ball and run. I got a basketball and tried it and it worked just fine. He obviously didn’t understand basketball.

(This explanation is due to Roger Miller via Steve Summit.)

Of course it is physically possible to pick up a basketball and run with it. It is also possible you will get away with it during a game.  However, it is against the rules; good players won’t do it and bad players won’t get away with it for long. Evaluating (INT_MAX+1) in C or C++ is exactly the same: it may work sometimes, but don’t expect to keep getting away with it. The situation is actually a bit subtle so let’s look in more detail.

First, are there C implementations that guarantee two’s complement behavior when a signed integer overflows? Of course there are. Many compilers will have this behavior when optimizations are turned off, for example, and GCC has an option (-fwrapv) for enforcing this behavior at all optimization levels. Other compilers will have this behavior at all optimization levels by default.

There are also, it should go without saying, compilers that do not have two’s complement behavior for signed overflows. Moreover, there are compilers (like GCC) where integer overflow behaved a certain way for many years and then at some point the optimizer got just a little bit smarter and integer overflows suddenly and silently stopped working as expected. This is perfectly OK as far as the standard goes. While it may be unfriendly to developers, it would be considered a win by the compiler team because it will increase benchmark scores.

In summary: There’s nothing inherently bad about running with a ball in your hands and also there’s nothing inherently bad about shifting a 32-bit number by 33 bit positions. But one is against the rules of basketball and the other is against the rules of C and C++. In both cases, the people designing the game have created arbitrary rules and we either have to play by them or else find a game we like better.

Why Is Undefined Behavior Good?

The good thing — the only good thing! — about undefined behavior in C/C++ is that it simplifies the compiler’s job, making it possible to generate very efficient code in certain situations. Usually these situations involve tight loops. For example, high-performance array code doesn’t need to perform bounds checks, avoiding the need for tricky optimization passes to hoist these checks outside of loops. Similarly, when compiling a loop that increments a signed integer, the C compiler does not need to worry about the case where the variable overflows and becomes negative: this facilitates several loop optimizations. I’ve heard that certain tight loops speed up by 30%-50% when the compiler is permitted to take advantage of the undefined nature of signed overflow. Similarly, there have been C compilers that optionally give undefined semantics to unsigned overflow to speed up other loops.

Why Is Undefined Behavior Bad?

When programmers cannot be trusted to reliably avoid undefined behavior, we end up with programs that silently misbehave. This has turned out to be a really bad problem for codes like web servers and web browsers that deal with hostile data because these programs end up being compromised and running code that arrived over the wire. In many cases, we don’t actually need the performance gained by exploitation of undefined behavior, but due to legacy code and legacy toolchains, we’re stuck with the nasty consequences.

A less serious problem, more of an annoyance, is where behavior is undefined in cases where all it does is make the compiler writer’s job a bit easier, and no performance is gained. For example a C implementation has undefined behavior when:

An unmatched ‘ or ” character is encountered on a logical source line during tokenization.

With all due respect to the C standard committee, this is just lazy. Would it really impose an undue burden on C implementors to require that they emit a compile-time error message when quote marks are unmatched? Even a 30 year-old (at the time C99 was standardized) systems programming language can do better than this. One suspects that the C standard body simply got used to throwing behaviors into the “undefined” bucket and got a little carried away. Actually, since the C99 standard lists 191 different kinds of undefined behavior, it’s fair to say they got a lot carried away.

Understanding the Compiler’s View of Undefined Behavior

The key insight behind designing a programming language with undefined behavior is that the compiler is only obligated to consider cases where the behavior is defined. We’ll now explore the implications of this.

If we imagine a C program being executed by the C abstract machine, undefined behavior is very easy to understand: each operation performed by the program is either defined or undefined, and usually it’s pretty clear which is which. Undefined behavior becomes difficult to deal with when we start being concerned with all possible executions of a program. Application developers, who need code to be correct in every situation, care about this, and so do compiler developers, who need to emit machine code that is correct over all possible executions.

Talking about all possible executions of a program is a little tricky, so let’s make a few simplifying assumptions. First, we’ll discuss a single C/C++ function instead of an entire program. Second, we’ll assume that the function terminates for every input. Third, we’ll assume the function’s execution is deterministic; for example, it’s not cooperating with other threads via shared memory. Finally, we’ll pretend that we have infinite computing resources, making it possible to exhaustively test the function. Exhaustive testing means that all possible inputs are tried, whether they come from arguments, global variables, file I/O, or whatever.

The exhaustive testing algorithm is simple:

  1. Compute next input, terminating if we’ve tried them all
  2. Using this input, run the function in the C abstract machine, keeping track of whether any operation with undefined behavior was executed
  3. Go to step 1

Enumerating all inputs is not too difficult. Starting with the smallest input (measured in bits) that the function accepts, try all possible bit patterns of that size. Then move to the next size. This process may or may not terminate but it doesn’t matter since we have infinite computing resources.

For programs that contain unspecified and implementation-defined behaviors, each input may result in several or many possible executions. This doesn’t fundamentally complicate the situation.

OK, what has our thought experiment accomplished? We now know, for our function, which of these categories it falls into:

  • Type 1: Behavior is defined for all inputs
  • Type 2: Behavior is defined for some inputs and undefined for others
  • Type 3: Behavior is undefined for all inputs

Type 1 Functions

These have no restrictions on their inputs: they behave well for all possible inputs (of course, “behaving well” may include returning an error code). Generally, API-level functions and functions that deal with unsanitized data should be Type 1. For example, here’s a utility function for performing integer division without executing undefined behaviors:

int32_t safe_div_int32_t (int32_t a, int32_t b) {
  if ((b == 0) || ((a == INT32_MIN) && (b == -1))) {
    report_integer_math_error();
    return 0;
  } else {
    return a / b;
  }
}

Since Type 1 functions never execute operations with undefined behavior, the compiler is obligated to generate code that does something sensible regardless of the function’s inputs. We don’t need to consider these functions any further.

Type 3 Functions

These functions admit no well-defined executions. They are, strictly speaking, completely meaningless: the compiler is not even obligated to generate even a return instruction. Do Type 3 functions really exist? Yes, and in fact they are common. For example, a function that — regardless of input — uses a variable without initializing it is easy to unintentionally write. Compilers are getting smarter and smarter about recognizing and exploiting this kind of code. Here’s a great example from the Google Native Client project:

When returning from trusted to untrusted code, we must sanitize the return address before taking it. This ensures that untrusted code cannot use the syscall interface to vector execution to an arbitrary address. This role is entrusted to the function NaClSandboxAddr, in sel_ldr.h. Unfortunately, since r572, this function has been a no-op on x86.

-- What happened?

During a routine refactoring, code that once read

aligned_tramp_ret = tramp_ret & ~(nap->align_boundary - 1);

was changed to read

return addr & ~(uintptr_t)((1 << nap->align_boundary) - 1);

Besides the variable renames (which were intentional and correct), a shift was introduced, treating nap->align_boundary as the log2 of bundle size.

We didn't notice this because NaCl on x86 uses a 32-byte bundle size.  On x86 with gcc, (1 << 32) == 1. (I believe the standard leaves this behavior undefined, but I'm rusty.) Thus, the entire sandboxing sequence became a no-op.

This change had four listed reviewers and was explicitly LGTM'd by two. Nobody appears to have noticed the change.

-- Impact

There is a potential for untrusted code on 32-bit x86 to unalign its instruction stream by constructing a return address and making a syscall. This could subvert the validator. A similar vulnerability may affect x86- 64.

ARM is not affected for historical reasons: the ARM implementation masks the untrusted return address using a different method.

What happened? A simple refactoring put the function containing this code into Type 3. The person who sent this message believes that x86-gcc evaluates (1<<32) to 1, but there’s no reason to expect this behavior to be reliable (in fact it is not on a few versions of x86-gcc that I tried). This construct is definitely undefined and of course the compiler can do done anything it wants. As is typical for a C compiler, it chose to simply not emit the instructions corresponding to the undefined operation. (A C compiler’s #1 goal is to emit efficient code.) Once the Google programmers gave the compiler the license to kill, it went ahead and killed. One might ask: Wouldn’t it be great if the compiler provided a warning or something when it detected a Type 3 function? Sure! But that is not the compiler’s priority.

The Native Client example is a good one because it illustrates how competent programmers can be suckered in by an optimizing compiler’s underhanded way of exploiting undefined behavior. A compiler that is very smart at recognizing and silently destroying Type 3 functions becomes effectively evil, from the developer’s point of view.

Type 2 Functions

These have behavior that is defined for some inputs and undefined for others. This is the most interesting case for our purposes. Signed integer divide makes a good example:

int32_t unsafe_div_int32_t (int32_t a, int32_t b) {
  return a / b;
}

This function has a precondition; it should only be called with arguments that satisfy this predicate:

(b != 0) && (!((a == INT32_MIN) && (b == -1)))

Of course it’s no coincidence that this predicate looks a lot like the test in the Type 1 version of this function. If you, the caller, violate this precondition, your program’s meaning will be destroyed. Is it OK to write functions like this, that have non-trivial preconditions? In general, for internal utility functions this is perfectly OK as long as the precondition is clearly documented.

Now let’s look at the compiler’s job when translating this function into object code. The compiler performs a case analysis:

  • Case 1: (b != 0) && (!((a == INT32_MIN) && (b == -1)))
    Behavior of / operator is defined â†’ Compiler is obligated to emit code computing a / b
  • Case 2: (b == 0) || ((a == INT32_MIN) && (b == -1))
    Behavior of / operator is undefined â†’ Compiler has no particular obligations

Now the compiler writers ask themselves the question: What is the most efficient implementation of these two cases? Since Case 2 incurs no obligations, the simplest thing is to simply not consider it. The compiler can emit code only for Case 1.

A Java compiler, in contrast, has obligations in Case 2 and must deal with it (though in this particular case, it is likely that there won’t be runtime overhead since processors can usually provide trapping behavior for integer divide by zero).

Let’s look at another Type 2 function:

int stupid (int a) {
  return (a+1) > a;
}

The precondition for avoiding undefined behavior is:

(a != INT_MAX)

Here the case analysis done by an optimizing C or C++ compiler is:

  • Case 1: a != INT_MAX
    Behavior of + is defined: Computer is obligated to return 1
  • Case 2: a == INT_MAX
    Behavior of + is undefined: Compiler has no particular obligations

Again, Case 2 is degenerate and disappears from the compiler’s reasoning. Case 1 is all that matters. Thus, a good x86-64 compiler will emit:

stupid:
  movl $1, %eax
  ret

If we use the -fwrapv flag to tell GCC that integer overflow has two’s complement behavior, we get a different case analysis:

  • Case 1: a != INT_MAX
    Behavior is defined: Computer is obligated to return 1
  • Case 2: a == INT_MAX
    Behavior is defined: Compiler is obligated to return 0

Here the cases cannot be collapsed and the compiler is obligated to actually perform the addition and check its result:

stupid:
  leal 1(%rdi), %eax
  cmpl %edi, %eax
  setg %al
  movzbl %al, %eax
  ret

Similarly, an ahead-of-time Java compiler also has to perform the addition because Java mandates two’s complement behavior when a signed integer overflows (I’m using GCJ for x86-64):

_ZN13HelloWorldApp6stupidEJbii:
  leal 1(%rsi), %eax
  cmpl %eax, %esi
  setl %al
  ret

This case-collapsing view of undefined behavior provides a powerful way to explain how compilers really work. Remember, their main goal is to give you fast code that obeys the letter of the law, so they will attempt to forget about undefined behavior as fast as possible, without telling you that this happened.

A Fun Case Analysis

About a year ago, the Linux kernel started using a special GCC flag to tell the compiler to avoid optimizing away useless null-pointer checks. The code that caused developers to add this flag looks like this (I’ve cleaned up the example just a bit):

static void __devexit agnx_pci_remove (struct pci_dev *pdev)
{
  struct ieee80211_hw *dev = pci_get_drvdata(pdev);
  struct agnx_priv *priv = dev->priv; 

  if (!dev) return;
  ... do stuff using dev ...
}

The idiom here is to get a pointer to a device struct, test it for null, and then use it. But there’s a problem! In this function, the pointer is dereferenced before the null check. This leads an optimizing compiler (for example, gcc at -O2 or higher) to perform the following case analysis:

  • Case 1: dev == NULL
    “dev->priv” has undefined behavior: Compiler has no particular obligations
  • Case 2: dev != NULL
    Null pointer check won’t fail: Null pointer check is dead code and may be deleted

As we can now easily see, neither case necessitates a null pointer check. The check is removed, potentially creating an exploitable security vulnerability.

Of course the problem is the use-before-check of pci_get_drvdata()’s return value, and this has to be fixed by moving the use after the check. But until all such code can be inspected (manually or by a tool), it was deemed safer to just tell the compiler to be a bit conservative. The loss of efficiency due to a predictable branch like this is totally negligible. Similar code has been found in other parts of the kernel.

Living with Undefined Behavior

In the long run, unsafe programming languages will not be used by mainstream developers, but rather reserved for situations where high performance and a low resource footprint are critical. In the meantime, dealing with undefined behavior is not totally straightforward and a patchwork approach seems to be best:

  • Enable and heed compiler warnings, preferably using multiple compilers
  • Use static analyzers (like Clang’s, Coverity, etc.) to get even more warnings
  • Use compiler-supported dynamic checks; for example, gcc’s -ftrapv flag generates code to trap signed integer overflows
  • Use tools like Valgrind to get additional dynamic checks
  • When functions are “type 2” as categorized above, document their preconditions and postconditions
  • Use assertions to verify that functions’ preconditions are postconditions actually hold
  • Particularly in C++, use high-quality data structure libraries

Basically: be very careful, use good tools, and hope for the best.

How to Debug

One of the painful parts of teaching a lab-based embedded systems course is that over and over I have to watch a team with a relatively simple bug in their code, but who is trying to fix it by repeatedly making random changes. Generally they start with code that’s pretty close to working and break it worse and worse. By the end of the lab they’re frustrated, aren’t any closer to finding the bug, and have made a complete mess of their code, forcing them to go back to the previous day or week’s version.

A typical Computer Science curriculum fails to teach debugging in any serious way. I’m not talking about teaching students to use debugging tools. Rather, we fail to teach the thing that’s actually important: how to think about debugging. Part of the problem is that most CS programming assignments are small, self-contained, and not really very difficult. The other part of the problem is that debugging is not addressed explicitly. After noticing these problems I started to focus on teaching students how to debug during lab sessions and also made a lecture on debugging that I give each year; this piece elaborates on that lecture.

First we’ll want to define some terms:

  • Symptom — a faulty program behavior that you can see, such as crashing or producing the wrong answer
  • Bug — a flaw in a computer system that may have zero or more symptoms
  • Latent bug — an asymptomatic bug; it will show itself at an inconvenient time
  • Debugging — using symptoms and other information to find a bug
  • Failure-inducing input — input to the program that causes the bug to execute and symptoms to appear. In some cases, capturing the complete failure-inducing input is hard because it may include window system events, hardware-level events like interrupts or bit-flips in RAM cells, and OS-level events like context switches and TCP timeouts. The times at which elements of the input occur may be important.
  • Deterministic platform — A platform having the property that it can reliably reproduce a bug from its failure-inducing input. Simulators should be deterministic, but getting determinism from bugs involving hardware/software interactions may be difficult or impossible.

The high-level reason debugging is hard is that it’s an inverse problem: it attempts to infer the cause for observed effects. Inverse problems are generally ill-posed and extra input is required to find a unique solution. A debugging problem may be especially difficult if:

  • A deterministic platform is unavailable
  • The bug is costly to reproduce, for example because it takes a long time to show up, it requires expensive hardware, or it requires many machines
  • Visibility into the executing system is limited, for example because it involves an embedded processor that doesn’t have printf() or JTAG
  • Several distinct bugs are collaborating to produce the symptoms that you see
  • The bug and its symptoms are widely separated in space and time
  • The bug is actually a mistaken assumption made by developers — How many times have you been burned because you were running an old version of the code, had some environment variable wrong, were linking against the wrong library, etc.?
  • The bug is in a part of the system considered out of scope: hardware, operating system, compiler, etc.

Of course, a very bad bug will involve several of these factors at the same time. A nice hard bug can set a development team back by weeks or more. The key to avoiding this kind of delay is to approach debugging with the correct mindset and with a good deal of experience. Careful, focused thinking can often see through a hard bug, whereas unfocused thinking combined with random changes to the program will not only waste time, but probably also break parts of the system that previously worked.

A Scientific Approach to Debugging

The following steps constitute a fairly complete approach to debugging. If I’ve done a good job describing these steps, as you read them you’ll see that you do, in fact, perform most of them every time you debug, but in a very informal and implicit way. The goal here is to make the entire process explicit not just to clarify our understanding, but also because I believe that the worst-case bugs — those that threaten to stop you for weeks or more — are best met with a methodical, deliberate approach.

1. Verify the Bug and Determine Correct Behavior

It makes no sense to even start debugging unless we’re pretty sure:

  1. There is actually a bug.
  2. We understand what the system should have done.

Sometimes these questions are easy to answer (“robot wasn’t supposed to catch fire”) but other times they can be distressingly difficult. I’ve gotten into long, drawn out arguments with smart people about whether a particular behavior exhibited by a C compiler is a bug or not; you’d think this wouldn’t happen for a language with a 550-page standard. For a computer system where no specification or reference implementation exists — for example, a new supernova simulation — we may have no idea whatsoever what the correct answer is.

2. Stabilize, Isolate, and Minimize

Now that we’re sure we have a real bug, we want to be able to reproduce it deterministically by isolating its failure-inducing input. If this input is timing-dependent or includes hardware-level events, the situation may become challenging and we need to stop and think: Is there some clever way to make the system deterministic? For example, let’s say that we have some bizarre interaction between an interrupt handler and our main loop. If we know about where the interrupt is firing, it may be possible to just disable the actual interrupt and insert a call to the interrupt-handling function from our main loop. If this works, our life is a lot easier. If not, we may still learn something.

This is a good time to make the failure-inducing input smaller if at all possible. For example, if our robot software crashes two hours into some sequence of activities, can we make it crash faster by moving the failing activity earlier in the robot’s schedule? If our word processing application crashes when it loads some huge document, can we find a single-page document that still crashes it by splitting the big doc into individual pages?  Small failure-inducing inputs are important for two reasons.  First, they are easier to work with and save time. Second, it is usually the case that larger failure-inducing inputs contain more junk that is irrelevant to the problem at hand, making it harder to spot the pattern that triggers the bug. If you can find a truly small failure-inducing input, often you’ll have done most of the debugging work without looking at a line of code.

3. Estimate a Probability Distribution for the Bug

Now we move from science to art. Using all available information, we have to guess what bug is causing our symptoms. Our first guesses can be crude; we’ll have plenty of time to refine them later. Let’s take a simple example where my partner and I are working on an embedded systems project and it doesn’t work: the robot moves left when it should move right. Since my partner implemented the finite state machine that controls robot direction, I’m going to guess this is her bug, while conceding the possibility that my own code may be at fault:

The interpretation of this pie chart is “I estimate about a 75% chance that the bug is in my partner’s code, a 25% chance it’s in my code, and a small chance it’s somewhere else entirely.”

Next, we’ll ask our instructor to take a guess. It turns out he’s been walking around the lab all afternoon looking over teams’ shoulders and has already seen three groups whose robot went the wrong direction because they wired up their motor driver chip the wrong way due to an ambiguity in the hardware documentation. Based on this experience, he estimates the bug’s probability distribution as:

Finally, let’s say that my partner knows something the rest of us don’t: she just finished an embedded project which used the same compiler that we’re using in class, which happens to have an extremely buggy implementation of the square root function. Furthermore, we have been using a lot of square root operations in our distance-estimation module and elsewhere. Thus, her estimation of the bug’s probability distribution is:

Does any of us know where the bug actually is? Of course not — this is all pure guesswork. But good guesses are important because they feed the next step of the process. Where do these probability distributions come from in practice? From years of bitter experience with similar debugging situations! It’s important to approach this step with an open mind; if the true location of the bug is not represented anywhere in our probability distribution, we’re going to waste a lot of time.

When guessing where the bug lives, it’s worth keeping Occam’s Razor in mind:

When you have two explanations, both consistent with the available evidence, prefer the simpler one.

Before moving on let’s notice that by talking to several people, we’ve failed to meet the original goal of estimating the probability distribution for this bug: we have three distributions, not one. To fix this we could combine these three distributions, perhaps giving higher weight to whoever we trust the most. In practice, however, this isn’t going to be necessary since the separate distributions have given us enough information that we can proceed.

Why did we talk to several people in the first place? It’s because there’s an insidious problem that underlies a lot of debugging woes.  If I have an error in my thinking that leads to bugs, this same error also makes it very difficult for me to figure out where the bug is. Programming in teams is a good start, but isn’t a complete answer because teams can easily talk themselves into all thinking the wrong way. A fresh set of eyes — a person who was not closely involved with the system’s design and implementation — is always useful.

4. Devise and Run an Experiment

Our next step is to eliminate some of the possibilities by running an experiment. Ideally, we’ll think of an experiment with a yes/no result that divides the probability space in half. That is, half of the possible bugs are eliminated regardless of the experiment’s result. In an information-theoretic sense, we maximize the information content of the experiment’s result when it splits the space of possibilities into parts of equal size. A few examples will help make this concrete.

Let’s start with my probability distribution from the first pie chart above. The obvious hypothesis is that my partner’s FSM is faulty. To test this hypothesis, we instrument her FSM with lots of assertions and debugging printouts (assuming these are possible on our robot) and run it for a while. If we did things right, the results of this experiment will either support or refute the hypothesis.

The second probability distribution, the one our instructor came up with, leads to an entirely different experiment. Here the obvious hypothesis is that our robot control board is wired improperly. To test this, we’ll either closely inspect our wiring or else move our robot software over to a different robot that is known to be wired up correctly. Either way, the hypothesis will be rapidly validated or falsified.

The third probability distribution leads to the hypothesis that the compiler is buggy. To test this, we’ll compile our robot software using a different compiler (we should be so lucky that this is easy…) and see if the program behavior changes.

Which of the three experiments should we run first? It probably doesn’t matter much — we should choose either the easiest experiment or else the one whose proponent has the strongest feelings.

Hopefully the pattern is now clear: based on our intuition about where the bug lies, we design an experiment that will reject roughly 50% of the possible bugs. Of course I’m oversimplifying a bit. First, experiments don’t always have yes/no results. Second, it may be preferable to run an easy experiment that rejects 15% of the possible bugs rather than a difficult one that rejects exactly 50% of them. There’s no wrong way to do this as long as we rapidly home in on the bug.

5. Iterate Until the Bug is Found

Steps 3 and 4 need to be repeated until finally we devise an experiment whose result serves as the smoking gun that unambiguously identifies the bug. This iteration leads us to perform an approximate binary search over the space in which we believe the bug to live. If our estimation of the bug’s probability distribution is accurate, this lets us find the bug by running a number of experiments logarithmic in the number of possible bugs.

6. Fix the Bug and Verify the Fix

Fixing is usually the easy part. Then, run the entire test suite to help ensure that the bug was really fixed and that the fix didn’t break anything that used to work.

7. Undo Changes

Finding and fixing a showstopping bug is a huge relief: finally you and the rest of your team can start to make progress again. However, in the rush to get back to work it’s easy to forget to undo any changes made to the code base to support debugging such as turning off the compiler’s optimizer, inserting extra printfs, and stubbing out time-consuming routines. It is important to roll back all of these changes or you’ll get nasty surprises later.

8. Create a Regression Test

Add a test case to the test suite which triggers the bug that was just fixed.

9. Find the Bug’s Friends and Relatives

Earlier I mentioned that debugging is an inverse problem. There’s a second inverse problem — which may be easier or harder than the first — which is to figure out what error in thinking led to the bug. This is important because the same thinko probably caused several bugs and you’ve only found one of them. For example, if we find a bug where I failed to properly check the return value from some library function, it’s a good bet that I did the same thing elsewhere. If we quickly inspect each library call in the code I’ve written, we’ll have a good chance of finding these. Of course we don’t have to do this, but if we don’t, the latent bugs will sit there waiting to break the system.

What If You Get Stuck?

In the examples above, each person’s probability distribution for the bug naturally suggested an experiment to run. This is not always how it works. If you are inexperienced or if the bug involves something relatively nasty like a difficult race condition or memory corruption, at some point in the debugging session your probability distribution may end up looking like this:

In other words, we’re 100% sure that we have no idea what is going on. Obviously you want to avoid this situation because life is now difficult: scientific debugging has reached an impasse. Your options aren’t great but there are still plenty of things to try:

  • Spend more time attempting to reduce the size of the failure-inducing input; often this gives insights into the nature of the problem
  • Gain additional visibility into the system by adding more watchpoints or debug print statements
  • Find a tool that can bring new information to light — valgrind, better compiler warnings, a race condition checker, or similar
  • Take a step back and question your assumptions about this bug; talking to someone new or just taking a walk can both help
  • Look at the code and think until you see the answer

Somehow, at some point in every serious programming project, it always comes down to the last option: stare at the code until you figure it out. I wish I had a better answer, but I don’t. Anyway, it builds character.

Another thing to keep in mind when you’re stuck is that perhaps you’re looking for interesting bugs, but the actual problem is boring and stupid. For example, the makefile is missing the dependency that’s supposed to rebuild the code we’re modifying, or a file is not being created because the disk quota is exceeded. Since there are a lot of things like this that can go wrong, it’s not possible to foresee them all. A reasonable way to deal with them is to develop a quick set of heuristics to try when system behavior feels funny, such as:

  • Run “make clean”
  • Logout and log back in
  • Reboot the machine
  • Run commands from someone else’s account
  • Use a different machine

Although it often feels wrong to resort to silly activities like these, they can really save time by rapidly ruling out broad categories of improbable problems.

Stupid quota problems and the like are difficult to debug for a simple reason: You, the developer, are heavily invested in making the program logic correct. This is hard work and it occupies nearly all of your brain power. When a bug pops up, the natural assumption to make is that it’s due to a flaw in your logic. This is often, but not always, the case.

Summary

The value of a methodical, stepwise approach like the one I’ve outlined here is that it helps make everything explicit: What are we assuming about where the bug might be? What’s our current hypothesis? What experiment will most effectively validate or falsify that hypothesis? Obviously you don’t need to actually draw the pie charts if you don’t feel like it. However, once you’ve wasted a couple days on a bug, it may well be worth it to back off and start over from first principles. Here’s one reason why I’m so sure that a bit of careful thought is the key to debugging.

What About Tools?

I’ll just say it again: the key to debugging is thinking, not tools. Of course tools are great, but the best tools money can buy will be of little use to a group who uses them to support a shotgun debugging approach.

It goes without saying that you and your teammates will:

  1. Use all available tool support (compiler warnings, static analyzers, etc.) to avoid putting bugs into your system in the first place
  2. Use all available tool support to assist the debugging process

Graphical debuggers are nice and you should learn how to use them effectively. However, they are not always available and are often not the best choice for a particular problem.

What About Testing?

It also goes without saying that your project — unless it is totally trivial — will have a test suite. Preferably it can be run, and its results checked, in an automated fashion.

Make Richard Feynman and Winnie the Pooh Your Role Models

The “he fixes radios by thinking” episode in Surely You’re Joking, Mr. Feynman is a great debugging story. But we don’t need to raise the bar quite as high as Feynman. In one of the funnier Winnie the Pooh stories, Piglet and Pooh follow some animal tracks around the woods, getting more and more frightened as the woozles and wizzles they are following grow more numerous. At the end of the story Pooh debugs his error by formulating a hypothesis and testing it with a simple experiment:

“Wait a moment,” said Winnie-the-Pooh, holding up his paw.

He sat down and thought, in the most thoughtful way he could think. Then he fitted his paw into one of the Tracks…and then he scratched his nose twice, and stood up.

“Yes,” said Winnie-the Pooh.

“I see now,” said Winnie-the-Pooh.

“I have been Foolish and Deluded,” said he, “and I am a Bear of No Brain at All.”

We should all be so honest with ourselves after making a silly mistake.

Further reading: If you found this post useful but want to learn more, this post discusses four good books about debugging.
 

I Can Solve the World’s Debugging Problems

Probably 10 times a semester, a student taking one of my courses sends me a mail like this:

Dr. Regehr I have this bug I’ve been working on for a long time, I’ve tried everything, I’m going to miss the deadline…

and then 15 minutes later, I get another email:

Nevermind I figured it out.

Is it a coincidence that students mail me just as they’re about to solve a difficult debugging problem? It is not. Rather, they have previously failed to explain the bug clearly to themselves. By explaining it to me in the email, they explain it to themselves and that is all it takes to figure it out. Seriously, I hardly even need to read this stuff.

Since I can solve so many problems with so little effort, I’m thinking it’s time to scale this up. Perhaps I’ll create a web site “Mail a Professor” where people can ask for advice about their worst debugging problems.  Nobody will read the mails, but people will get a nice randomized response saying things like “Your makefile contains a suspicious dependency” or “Please explain step two in more detail, I don’t yet understand what’s going on” or “Have you tried the latest version of Valgrind?”

The Truth About the Life of the Mind

[This piece is a followup to The Big Lie About the Life of the Mind.]

Being a professor, like any other job, has its pros and cons. You’d hope that one of the advantages would be that the job encourages a person to live a life of the mind. Otherwise what’s the point, right?  I was thinking about this and realized I didn’t know quite what “life of the mind” means. There’s no accepted definition that I’m aware of. Should I be reading a book every day? Writing a book every six months? Spending my evenings debating philosophy in coffee houses? Meditating?

Eventually I decided that while I’ve never written a book and don’t debate philosophy much, I live a fairly mindful life:

  • I spend a good number of hours per week just thinking. Probably not 10 hours, but definitely at least five, mostly while hiking on easy trails near my house. (Coherent thought at home, where the three and five year olds live, is challenging. I’ve long since given up trying to think at the office.)
  • I spend a good amount of time, again probably five hours in a typical week, reading things that I want to read. Here I’m talking about reading material that relates to being a researcher, but that I’m not forced to read in order to do my job. Generally I’m in the middle or one or two books and three or four papers.
  • My main job is doing research and most parts of the research process including planning it, doing it, and writing about it, require hard thinking.
  • My other main job is teaching. Some teaching time, maybe 5-10%, counts as good thinking time: How to make this concept clear? What kind of example best suits this audience?

On the other hand, let’s be realistic: I spend a ton of time in meetings, managing people, dealing with department politics, writing boring parts of grant proposals and papers, writing reports for funding agencies, solving (and ducking) budget problems, helping students learn to present and write, reviewing papers and proposals that are of marginal interest to me, grading exams, giving lectures I’ve given many times before, and traveling. There is no doubt that in terms of hours, these kinds of activities dominate my work life. I don’t despise any of them, but it would be hard to say they are indicative of a life of the mind.

My view has been that in the long run, being a professor makes no sense unless the job supports, to a substantial extent, a life of the mind. Clearly one can succeed perfectly well without this life, for example by mechanically cranking out papers and proposals, being a good administrator, teaching by rote, etc. Certainly there exist professors who operate this way. I’m not trying to be nasty — just observing that if they do live a life of the mind, this fact is not externally apparent.

An important thing I learned during my years as an assistant professor is that the job is not really structured to provide a life of the mind. Rather, it is structured — in the ideal case — to give a fledgling academic a reasonable chance to learn how to teach and to develop a full-blown research program including students, grants, and lots of papers.

To live a life of the mind, one has to carve out time. I’m sure that some people accomplish this through good time management. On the other hand, one can — as I do — simply carve out the time regardless of the consequences. For an assistant professor this would increase the risk of not getting tenure. This is unavoidable, unless you instead carve the time out of family life or sleep. For a tenured professor, the consequence is doing a slightly worse job at service, teaching, and short-term research.  In both cases, it’s a tradeoff, you have to just go ahead and make it under the assumption that the long-run payoff in terms of career happiness and productivity will be worth it.

But why would a person want to live a life of the mind in the first place? It’s not obviously an inherently desirable thing. Rather, I feel like most of my role models have been hard thinkers: they either wrote books that expanded my mind or told me things that I didn’t know and wouldn’t have ever thought of. I came to admire the kind of person who produced these thoughts, and somehow this feeling led me down the academic track. This is actually the best explanation I can give of why I’m a professor. Even three years before I took the job, I had no intention whatsoever of becoming an academic.

It should go without saying that there are probably much easier and less stressful ways to have a life of the mind than becoming a professor. For example, a person could get a degree in a practical, high-demand field such as nursing or plumbing or programming, become very good at it in order to be mobile and command a good salary, but work no more than 40 hours per week. There are 80 remaining non-sleep hours per week for reading and writing books, producing art, discussing philosophy, or anything else that seems appropriate.