Skip to content

Paranoid Programming

A lot of software defects stem from developers who lack a proper sense of paranoia. While paranoia can of course be taken to idiotic extremes, a healthy respect for Murphy’s Law is useful in programming. This post is a list of useful paranoid techniques that I hope to use as the basis for a lecture on paranoid programming for my “solid code” class — please leave a note if you have a favorite paranoid technique that I’ve missed.

  • using plenty of assertions

  • validating and sanitizing inputs

  • checking for all possible errors returned by library functions, not just likely ones

  • erring on the side of caution when corrupted state is detected, for example by halting or restarting part or all of the system instead of attempting to repair and recover

  • checksumming RAM and ROM contents in embedded systems

  • minimizing the scope from which identifiers are visible

  • using and paying attention to compiler warnings and static analysis results

  • using and paying attention to the output of dynamic tools like Valgrind and Clang’s UBSan

  • using (to whatever extent this is feasible) formal methods tools like Frama-C to verify tricky, important functions

  • writing fuzzers even for software that shouldn’t need to be fuzzed

  • leveraging the type system to get better guarantees, for example by avoiding unnecessary type casts and by putting some kinds of integer values into single-members structs when writing C

  • for critical loops, writing down the loop variant and invariant

  • if possible, building the code using different compilers and testing the resulting executables against each other

  • using timeouts, retry loops, and watchdog timers when appropriate

  • aggressively keeping functions simple

  • aggressively avoiding mutable global state

  • not using concurrency unless absolutely necessary

  • taking advantage of whatever language support is available for controlling mutable state: const, pure functions, ownership types, etc.

  • giving subsystems the least privileges they need to do their jobs, for example using seccomp

  • testing, testing, testing, and coverage

  • being aware of current best practices for things like choice of libraries

  • tracking down transient failures rather than ignoring them

  • disabling interrupts inside of interrupt handlers, even when timing arguments support the non-existence of nested interrupt handlers

  • getting people who didn’t write the code to read the code

  • conservative allocation of stack memory in environments where stacks are a constrained resource

  • unit-testing library functions, particularly mathematical ones, instead of trusting that they are correct for all inputs

As a random example, several commenters (and Bruce Dawson in email) took issue with the volatile-qualified lock variable in this post; my view is that this qualifier probably does no harm and might do some good by encouraging the compiler to be careful in how it touches the lock variable. This seems to me to be a reasonable application of paranoia; obviously others disagree.

The other day someone asked me how paranoid programming differs from regular old defensive programming. I don’t necessarily have a great answer but (1) the top 10 google hits for “defensive programming” seemed to be missing many of the techniques I’ve listed here and (2) “defensive” doesn’t quite capture the depth of my mistrust of computer programs. Just because you’re paranoid doesn’t mean they’re not really out to get you, and make no mistake: the computer is out to get you (as are the people on the other side of the network).

Writing Solid Code Weeks 5 and 6 — The Plot Thickens

Last week was pretty relaxed, I lectured on doing code reviews — pretty much the standard pitch. This week we spent Tuesday doing a code review, I think it went pretty well. I had chosen a team whose code was strong so that our first code review would be a positive one.

Today I announced that the Huffman compression that the class has been working on has been found by the client to be unsuitable, it does not give a good enough compression ratio for some files of interest: massive PBMs showing mostly text and some graphics. The client decided that its requirements would be better met by first performing run-length compression on the PBM files at the bit level, resulting in 8-bit run-length codes: one bit for the sense of the run, 7 bits for length. Since runs of white are much more common than runs of black, there’s still plenty of redundancy left in the RLE codes, so they should subsequently be Huffman encoded.

The other aspect of today’s assignment is that due to some internal shuffling at our company, the teams have been moved around. Each team must implement RLE compression on top of the Huffman implementation provided by the team whose number is one greater (modulo seven). Teams are forbidden from touching their old code, although I hope they will be willing to answer questions about it. This is an exercise that I’ve been wanting to spring on students for a long time. Hopefully it will be a character-building experience for everyone.

Also in class today we spent some time looking at defect reports on the students’ code from Coverity Scan. This was (for me at least) pretty entertaining. First, the defect reports were of impressively high quality. Second, Coverity’s web interface is quite nice to use

What I Accomplished in Grad School

I often talk to students who are thinking about grad school. The advice I generally give is a dressed-up version of “Just do whatever the hell will make you happy.” But if we all had solid ideas about what would make us happy then, well, we’d probably be a lot more happy. Here’s a list of things that I actually accomplished in grad school. Most of these things did make me happy or at least were satisfying. Of course, I cannot know the extent to which these things would make other people happy, and I also cannot know whether I would have been happier with the things that I’d have accomplished if I hadn’t gone to grad school. Since I got a PhD 13 years ago and started the program 18.5 years ago (crap!) I have at least a modest amount of perspective at this point.

First, some work-related things.

  • I became pretty good at doing and evaluating research.

  • I started to become good at writing. When I arrived at grad school I was not a good writer. When I left, I was not good either, but at least I was on the way. Since 2001, every time I write something, I have been thankful that it’s not a PhD thesis.

  • I wrote a few pretty decent papers. None of them set the world afire, but none of them has been a source of embarrassment either.

  • I did some internships in industry and, along the way, learned a bit about how the real world works, if such a thing can be said to exist.

But really, the things in grad school that weren’t about work were better:

  • I read a lot of books, often several per week. I’m afraid that I’m going to have to get the kids out of the house and also retire if I want to reach that level again.

  • I found someone to spend the rest of my life with. This was the purest luck.

  • I made a number of friends who I am still close to, though we don’t talk nearly often enough. I doubt that I’ll ever have another group of friends as good as these.

  • I became quite good at disc golf.

  • I did a decent amount of programming for fun.

  • I avoided going into debt. In fact, the TA and RA stipends that I received in grad school felt like a lot of money compared to the ~$7000/year that I lived on as an undergrad.

There are a bunch of things that are important that I did not accomplish in grad school:

  • I failed to learn even rudimentary time management.

  • I did not develop good eating, drinking, sleeping, or exercise habits. When I graduated I was under the impression that my body could tolerate almost any sort of abuse.

  • I didn’t learn to choose good research topics, this took several more years.

  • I didn’t figure out what I wanted to do with my life.

I put this out there on the off chance that it might be useful for people who are thinking about grad school.

Assertions Are Pessimistic, Assumptions Are Optimistic

We assert a condition when we believe it to be true in every non-buggy execution of our program, but we want to be notified if this isn’t the case. In contrast, we assume a condition when our belief in its truth is so strong that we don’t care what happens if it is ever false. In other words, while assertions are fundamentally pessimistic, assumptions are optimistic.

C and C++ compilers have a large number of built-in assumptions. For example, they assume that every pointer that is dereferenced refers to valid storage and every signed math operation fails to overflow. In fact, they make one such assumption for each of the hundreds of undefined behaviors in the languages. Assumptions permit the compiler to generate fast code without working too hard on static analysis.

This post explores the idea of a general-purpose assume mechanism. Although this isn’t found (as far as I know — please leave a comment if you know of something) in any commonly-used programming language, it might be useful when we require maximum performance. Although an assume mechanism seems inherently unsafe, we could use it in a safe fashion if we used a formal methods tool to prove that our assumptions hold. The role of the assumption mechanism, then, is to put high-level knowledge about program properties — whether from a human or a formal methods tool — into a form that the compiler can exploit when generating code.

Standard C has the “restrict” type qualifier that lets the compiler assume that the pointed-to object is not aliased by other pointers. Let’s consider this (deliberately silly) function for summing up the elements of an array:

void sum (int *array, int len, int *res)
{
  *res = 0;
  for (int i=0; i<len; i++) {
    *res += array[i];
  }
}

The problem that the compiler faces when translating this code is that res might point into the array. Thus, *res has to be updated in every loop iteration. GCC and Clang generate much the same code:

sum:    movl	$0, (%rdx)
        xorl	%eax, %eax
.L2:    cmpl	%eax, %esi
	jle	.L5
	movl	(%rdi,%rax,4), %ecx
	incq	%rax
	addl	%ecx, (%rdx)
	jmp	.L2
.L5:    ret

Both compilers are able to generate code that is five times faster — using vector instructions — if we change the definition of sum() to permit the compiler to assume that res is not an alias for any part of array:

void sum (int *restrict array, int len, int *restrict res)

Another form of assumption is the __builtin_unreachable() primitive supported by GCC and Clang, which permits the compiler to assume that a particular program point cannot be reached. In principle __builtin_unreachable() is trivial to implement: any unconditionally undefined construct will suffice:

void unreachable_1 (void)
{
  int x;
  x++;
}

void unreachable_2 (void)
{
  1/0;
}

void unreachable_3 (void)
{
  int *p = 0; 
  *p;
}

void unreachable_4 (void)
{
  int x = INT_MAX;
  x++;
}

void unreachable_5 (void)
{
  __builtin_unreachable ();
}

But in practice the compiler’s interpretation of undefined behavior is not sufficiently hostile — bet you never thought I’d say that! — to drive the desired optimizations. Out of the five functions above, only unreachable_5() results in no code at all being generated (not even a return instruction) when a modern version of GCC or Clang is used.

The intended use of __builtin_unreachable() is in situations where the compiler cannot infer that code is dead, for example following a block of inline assembly that has no outgoing control flow. Similarly, if we compile this code we will get a return instruction at the bottom of the function even when x is guaranteed to be in the range 0-2.

int foo (int x)
{
  switch (x) {
  case 0: return bar(x);
  case 1: return baz(x);
  case 2: return buz(x);
  }
  return x; // gotta return something and x is already in a register...
}

If we add a __builtin_unreachable() at the bottom of the function, or in the default part of the switch, then the compiler drops the return instruction. If the assumption is violated, for example by calling foo(7), execution will fall into whatever code happens to be next — undefined behavior at its finest.

But what about the general-purpose assume()? This turns out to be easy to implement:

void assume (int expr)
{
  if (!expr) __builtin_unreachable();
}

So assume() is using __builtin_unreachable() to kill off the collection of program paths in which expr fails to be true. The interesting question is: Can our compilers make use of assumptions to generate better code? The results are a bit of a mixed bag. The role of assume() is to generate dataflow facts and, unfortunately, the current crop of compilers can only be trusted to learn very basic kinds of assumptions. Let’s look at some examples.

First, we might find it annoying that integer division in C by a power of 2 cannot be implemented using a single shift instruction. For example, this code:

int div32 (int x)
{
  return x/32;
}

Results in this assembly from GCC:

div32:  leal	31(%rdi), %eax
	testl	%edi, %edi
	cmovns	%edi, %eax
	sarl	$5, %eax
	ret

And this from Clang:

div32:  movl	%edi, %eax
	sarl	$31, %eax
	shrl	$27, %eax
	addl	%edi, %eax
	sarl	$5, %eax
	ret

The possibility of a negative dividend is causing the ugly code. Assuming that we know that the argument will be non-negative, we try to fix the problem like this:

int div32_nonneg (int x)
{
  assume (x >= 0);
  return div32 (x);
}

Now GCC nails it:

div32_nonneg:
	movl	%edi, %eax
	sarl	$5, %eax
	ret

Clang, unfortunately, generates the same code from div32_nonneg() as it does from div32(). Perhaps it lacks the proper value range analysis. I’m using Clang 3.4 and GCC 4.8.2 for this work, by the way. UPDATE: In a comment Chris Lattner provides a link to this known issue in LLVM.

Next we’re going to increment the value of each element of an array:

void inc_array (int *x, int len)
{
  int i;
  for (i=0; i<len; i++) {
    x[i]++;
  }
}

The code generated by GCC -O2 is not too bad:

inc_array:
	testl	%esi, %esi
	jle	.L18
	subl	$1, %esi
	leaq	4(%rdi,%rsi,4), %rax
.L21:   addl	$1, (%rdi)
	addq	$4, %rdi
	cmpq	%rax, %rdi
	jne	.L21
.L18:   rep ret

However, is that test at the beginning really necessary? Aren’t we always going to be incrementing an array of length at least one? If so, let’s try this:

void inc_nonzero_array (int *x, int len)
{
  assume (len > 0);
  inc_array (x, len);
}

Now the output is cleaner:

inc_nonzero_array:
	subl	$1, %esi
	leaq	4(%rdi,%rsi,4), %rax
.L24:   addl	$1, (%rdi)
	addq	$4, %rdi
	cmpq	%rax, %rdi
	jne	.L24
	rep ret

The preceding example was suggested by Arthur O’Dwyer in a comment on my assertion post.

If you readers have any good examples where non-trivial assumptions result in improved code, I’d be interested to see them.

Next, let’s look at the effect of assumptions in a larger setting. I compiled LLVM/Clang in four different modes. First, in its default “Release+Assertions” configuration. Second, in a “minimal assert” configuration where assert() was defined like this:

#define assert(expr) ((expr) ? static_cast<void>(0) : _Exit(-1))

This simply throws away the verbose failure information. Third, in a “no assert” configuration where assert() was defined like this:

#define assert(expr) (static_cast<void>(0))

Fourth, I turned each assertion in LLVM/Clang into an assumption like this:

#define assert(expr) ((expr) ? static_cast<void>(0) : __builtin_unreachable())

Then I built each of the configurations using GCC 4.8.2 and Clang 3.4, giving a total of eight Clang binaries as a result. Here’s the code size:

Bear in mind that the y-axis doesn’t start at zero, I wanted to make the differences stand out. As expected, default assertions have a heavy code size penalty. Perhaps interestingly, GCC was able to exploit assumptions to the point where there was a small overall code size win. LLVM did not manage to profit from assumptions. Why would assumptions have a code size penalty over no assertions? My guess is that some assumptions end up making function calls that cannot be optimized away, despite their lack of side effects.

Now let’s look at the performance of the eight clangs. The benchmark here was optimized compilation of a collection of C++ files. Each number in the graph is the median value of 11 reps, but really this precaution was not necessary since there was very little variance between reps. Note again that the y-axis doesn’t start at zero.

Both compilers are slowed down by assumes. Again, I would guess this is because sometimes the compiler cannot elide function calls that are made during the computation of a expression that is being assumed.

One surprise from these graphs is that Clang is generating code that is both smaller and faster than GCC’s. My view (formed several years ago) has been that Clang usually produces slightly slower code, but requires less compile time to do it. This view could easily have become invalidated by rapid progress in the compiler. On the other hand, since the code being compiled is LLVM/Clang itself, perhaps we should expect that Clang has been extensively tuned to generate good object code in this case.

A not-surprise from these graphs is that we generally cannot just take a big collection of assertions, turn them into assumptions, and expect good results. Getting good results has two parts: an easy one and a hard one. The easy part is selectively removing assumptions that are hurting more than they help. These would tend to be complex conditions that are slow to compute and that have no chance of generating dataflow facts that the compiler can make use of. This picking and choosing could even be done automatically.

The second part of getting good results out of assumptions is creating compilers that are more generally capable of learning and exploiting arbitrary new dataflow facts provided by users. In the short run, this is a matter of testing and fixing and tuning things up. In the long run, my belief is that compilers are unnecessarily hampered by performance constraints — they have to be pretty fast, even when compiling large codes. An alternative is to support a “-Osearch” mode where the compiler spends perhaps a lot of time looking for better code sequences; see this comment from bcs. The compiler’s search could be randomized or it could involve integration with an SMT solver. Companies that burn a lot of CPU cycles in server farms should be highly motivated to optimize, for example, the 500 functions that use the most power every day. I’m assuming that some sort of systematic cluster-wide profiling facility exists, we don’t want to waste time optimizing code unless it’s causing pain.

The postcondition of any assumption is the same as the postcondition of asserting the same expression. Thus, we can see that asserts can also make our code faster — but this requires the speedup from the extra dataflow facts to outweigh the slowdown from the assertion check. I don’t know that anyone has ever looked into the issue of when and where assertions lead to faster code. We would not expect this to happen too often, but it would be cute if compiling some big program with -DNDEBUG made it slower.

Although they would normally be used to support optimization, I believe that assumptions can also directly support bug detection. A tool such as Stack could be modified to analyze a program with the goal of finding code that becomes dead due to an assumption. The presence of such code is likely to signal a bug.

In summary, assume() is a mechanism for introducing new undefined behaviors into programs that we write. Although this mechanism would be easy to misuse, there’s no reason why it cannot be used safely and effectively when backed up by good programming practices and formal methods.

Scary Compiler Code Motion

I ran across this blog post about reader-writer locks and thought it would be fun to present them to the Advanced Operating Systems class that I’m running this spring. Here’s my version of the code, which needed to be adapted a bit to work with GCC:

struct RWSpinLock
{
  volatile uint32_t lock;
};

void LockReader(struct RWSpinLock *l)
{
  while (1) {
    uint32_t OldLock = l->lock & 0x7fffffff;
    uint32_t NewLock = OldLock + 1;   
    if (__sync_val_compare_and_swap(&l->lock, OldLock, NewLock) == OldLock) 
      return;
  }
}
  
void UnlockReader(struct RWSpinLock *l)
{
  __sync_sub_and_fetch(&l->lock, 1);
}

void LockWriter(struct RWSpinLock *l)
{
 again:
  {
    uint32_t OldLock = l->lock & 0x7fffffff;
    uint32_t NewLock = OldLock | 0x80000000;    
    if (!(__sync_val_compare_and_swap(&l->lock, OldLock, NewLock) == OldLock)) 
      goto again;
    while (l->lock & 0x7fffffff) { }
  }
}

void UnlockWriter(struct RWSpinLock *l)
{
  l->lock = 0;
}

At first I was suspicious that some sort of fence is required in the UnlockWriter() function. However, I believe that that is not the case: TSO will naturally make sure that all cores see operations inside the locked region prior to seeing the lock release. On some non-TSO platforms this unlock function would be too weak.

But the hardware is only one source of reordering that can mess up locks — the compiler is another. The store to the volatile lock field in UnlockWriter() cannot be moved around another access to a volatile-qualified object, but (in my reading of the standard) stores to non-volatile objects can be moved around stores to volatiles.

Can we get a compiler to translate the (in principle) broken code into assembly that actually doesn’t work? Turns out this is easy:

int important_data;
struct RWSpinLock l;

void update (void)
{
  LockWriter (&l);
  important_data /= 20;
  UnlockWriter (&l);
}

Using the default compiler on an x86-64 Ubuntu 12.04 machine (GCC 4.6.3) at -O2, the resulting assembly is:

update:
	movl	$l, %edi
	call	LockWriter
	movl	important_data(%rip), %ecx
	movl	$1717986919, %edx
	movl	$0, l(%rip)
	movl	%ecx, %eax
	sarl	$31, %ecx
	imull	%edx
	sarl	$3, %edx
	subl	%ecx, %edx
	movl	%edx, important_data(%rip)
	ret

Notice that important_data is read inside the lock and written well outside of it. Yikes! Here’s how to stop the compiler from doing that:

void UnlockWriter(struct RWSpinLock *l)
{
  // this is necessary or else GCC will move operations out of the locked region
  asm volatile("" ::: "memory");
  l->lock = 0;
}

Versions of Clang that I tried (3.3 and 3.4 for x86-64) didn’t want to move the store to important_data outside of the lock, but it’s possible that I just wasn’t asking the right way.

Use of Assertions

Assertions are great because they:

  • support better testing,
  • make debugging easier by reducing the distance between the execution of a bug and the manifestation of its effects,
  • serve as executable comments about preconditions and postconditions,
  • can act as a gateway drug to formal methods.

Assertions are less than great because they:

  • slow down our code,
  • make our programs incorrect — when used improperly,
  • might trick some of us lazy programmers into using them to implement error handling,
  • are commonly misunderstood.

This post will go through the care and feeding of assertions with the goal of helping developers maximize the benefits and minimize the costs.

Definitions

The best definition of assertion that I’ve seen is from the C2 wiki:

An assertion is a Boolean expression at a specific point in a program which will be true unless there is a bug in the program.

This definition immediately tells us that assertions are not to be used for error handling. In contrast with assertions, errors are things that go wrong that do not correspond to bugs in the program. For example, we might ask if this assertion is a good one:

int result = open (filename, O_RDWR);
assert (result != -1);

The definition tells us that the expression result != -1 will be true unless there is a bug in the program. Clearly this is not (generally) the case: the call to open() will fail depending on the contents of the filesystem, which are not wholly under the control of our program. Therefore, this assertion is a poor one. The ability to make a clear distinction between program bugs and externally-triggered error conditions is the most important prerequisite for effective use of assertions.

This post on assertions by Ned Batchelder gives an alternate definition:

ASSERT(expr)

Asserts that an expression is true. The expression may or may not be evaluated.

  • If the expression is true, execution continues normally.
  • If the expression is false, what happens is undefined.

Notice the difference between the first definition — which told us what an assertion actually means in terms of program correctness — and this one, which is operational and doesn’t mention bugs at all. This is a language designer’s and compiler writer’s view of assertions. It is useful because it makes it clear that while the assertion might be executed, we must not count on its being executed. This foreshadows an issue that I’ll discuss below, which is that assertions must not change the state of the program.

“What happens is undefined” when an assertion fails is taking a pretty strong view of things. First, this says that assertion failure might result in a message being logged to a console, a fatal signal being sent to the application, an exception being thrown, or no effect at all. Second, “undefined behavior when p is false” is equivalent to “p may be assumed to be true”. Therefore, the compiler should feel free to optimize the program under the assumption that the asserted condition holds. Although this might be what we want — in fact it would be really cool if adding assertions made our code faster rather than slower — it’s not an interpretation that is universally useful. As developers, we might want to count on a certain kind of behavior when an assertion fails. For example, Linux’s BUG_ON() is defined to trigger a kernel panic. If we weaken Linux’s behavior, for example by logging an error message and continuing to execute, we could easily end up adding exploitable vulnerabilities.

Where Do Assertions Come From?

When we write an assertion, we are teaching a program to diagnose bugs in itself. The second most common problem that students have with assertions (the first most common being a failure to make the distinction between bugs and error conditions) is figuring out what to assert. This is not so easy at first, but (for many people, at least) it soon becomes second nature. The short answer to where assertions come from is “outside of the core program logic.” Specifically, assertions come from:

Math. Basic math gives us simple ways — such as casting out nines — to check for errors in a complex operation. In a mathematical computer program there are typically many opportunities to exploit similar tricks. If we’re computing the angles between the sides of a triangle, it would be reasonable to assert that the angles sum to 180 degrees (or close to 180 if we’re using floating point). If we’re implementing an involved method of computing square roots, we might assert that the result squared is equal to the original argument. If we’re implementing a fast and tricky cosine function, asserting that its output is in [-1,1] would be a useful sanity check. In general, CS theory tells us that there are likely to be many problems that are inherently much more difficult to solve than to verify. Asserting the verification of the solution to any hard problem (whether it is in NP or not) would be an excellent thing to do.

Preconditions. When something has to be true for our code to execute correctly, it’s often worth asserting the condition up-front. This documents the requirement and makes it easier to diagnose failures to meet it. Examples include asserting that a number is non-negative before computing its square root, asserting that a pointer is non-null before dereferencing it, and asserting that a list does not contain duplicate elements before relying on that fact.

Postconditions. Often, near the end of a function, some kind of non-trivial but easy-to-check guarantee will be made, such as a search tree being balanced or a matrix being upper triangular. If we think there’s even a remote chance that our logic does not respect this guarantee, we can assert the postcondition to make sure.

Invariants. Data and data structures often have properties that need to hold in non-buggy executions. For example, in a doubly-linked list, things have gone wrong if there exists a node where n->next->prev != n. Similarly, binary search trees often require that the value attached to the left node is not greater than the value attached to the right node. Of course these are easy examples: programs such as compilers that do a lot of data structure manipulation can have almost arbitrarily elaborate data structure invariants.

The spec. Since a program is buggy by definition if it doesn’t implement the spec, specifications can be a powerful source of conditions to assert. For example, if our financial program is designed to balance the books, we might assert that no money has been created or destroyed after the day’s transactions have all been processed. In a typesetting program, we might assert that no text has been placed outside of the page.

Assertions vs. Contracts

As shown in the previous section, assertions fill a variety of roles. Assertions about preconditions and postconditions that hold at module boundaries are often called contracts. Programming languages such as Eiffel, D, Racket, and Ada provide first-class support for contracts. In other languages, contract support is available via libraries or we can simply use assertions in a contract-like fashion.

How About Some Examples?

Let’s look at assertions in action in two sophisticated code bases. Here’s a super badass assertion in Mozilla that was triggered by 66 different bugs. Jesse Ruderman adds that “about half of the bugs that trigger the assertion CAN lead to exploitable crashes, but without a specially crafted testcase, they will not crash at all.” Here’s another awesome Mozilla assertion with 33 bugs that could trigger it.

The LLVM project (not including Clang, and not including the “examples” and “unittests” directories) contains about 500,000 SLOC in .cpp files and about 7,000 assertions. Does one assertion per 70 lines of code sound high or low? It seems about right to me.

I arbitrarily picked the C++ file in LLVM that is at the 90th percentile for number of assertions (the file with the median number of assertions contains just one!). This file, which can be found here, contains 18 assertions and it’s not too hard to get a sense for their meaning even without seeing the surrounding code:

assert(BlockAddrFwdRefs.empty() && "Unresolved blockaddress fwd references");
assert(Ty == V->getType() && "Type mismatch in constant table!");
assert((Ty == 0 || Ty == V->getType()) && "Type mismatch in value table!");
assert(It != ResolveConstants.end() && It->first == *I);
assert(isa<ConstantExpr>(UserC) && "Must be a ConstantExpr.");
assert(V->getType()->isMetadataTy() && "Type mismatch in value table!");
assert((!Alignment || isPowerOf2_32(Alignment)) && "Alignment must be a power of two.");
assert((Record[i] == 3 || Record[i] == 4) && "Invalid attribute group entry");
assert(Record[i] == 0 && "Kind string not null terminated");
assert(Record[i] == 0 && "Value string not null terminated");
assert(ResultTy && "Didn't read a type?");
assert(TypeList[NumRecords] == 0 && "Already read type?");
assert(NextBitCode == bitc::METADATA_NAMED_NODE); (void)NextBitCode;
assert((CT != LandingPadInst::Catch || !isa<ArrayType>(Val->getType())) && 
        "Catch clause has a invalid type!");
assert((CT != LandingPadInst::Filter || isa<ArrayType>(Val->getType())) && 
        "Filter clause has invalid type!");
assert(DFII != DeferredFunctionInfo.end() && "Deferred function not found!");
assert(DeferredFunctionInfo.count(F) && "No info to read function later?");
assert(M == TheModule && "Can only Materialize the Module this BitcodeReader is attached to.");

The string on the right side of each conjunction is a nice touch; it makes assertion failure messages a bit more self-documenting than they otherwise would have been. Jesse Ruderman points out that a variadic assert taking an optional explanation string would be less error prone, and he also points to Mozilla’s implementation of this, which burned my eyes.

Do the assertions in LLVM ever fail? Of course they do. As of right now, 422 open bugs in the LLVM bug database match the search string “assertion.”

GCC (leaving out its testsuite directory) contains 1,228,865 SLOC in its .c files, with about 9,500 assertions, or about one assertion per 130 lines of code.

Things I’ve be interested in hearing from readers:

  • Your favorite assertion
  • The assertion ratio of your favorite code base, if you have the data available and can share it

Mistakes in Assertions

There’s only one really, really bad mistake you can make when writing an assertion: changing the state of the program while evaluating the Boolean condition that is being asserted. This is likely to come about in one of two ways. First, in a C/C++ program we sometimes like to accidentally write this sort of code:

assert (x = 7);

This can be avoided using Yoda conditions or — better yet — just by being careful. The second way to accidentally change the state of the program is like this:

assert (treeDepth() == 7);

but unfortunately treeDepth() changes the value in some variable or heap cell, perhaps via a longish call chain.

In case it isn’t totally clear, the problem with side-effects in assertions is that we’ll test our program for a while, decide it’s good, and do a release build with assertions turned off and of course suddenly it doesn’t work. Or, it might be the release version that works but our debug build is broken by a side-effecting assertion. Dealing with these problems is highly demoralizing since assertions are supposed to save time, not eat it up. I feel certain that there are static analyzers that warn about this kind of thing. In fact, the original paper about the work that became Coverity’s tool mentions exactly this analysis in Section 4.1, and also gives plenty of examples of this bug. This is an area where language support for controlling side effects would be useful. Such support is extremely primitive in C/C++.

I feel compelled to add that of course every assertion changes the state of the machine, for example by using clock cycles, flipping bits in the branch predictor, and by causing cache lines to be loaded or evicted. In some circumstances, these changes will feed back into the program logic. For example, a program that runs 2% slower when compiled with assertions may run afoul of network timeouts and behave totally differently than its non-assert cousin. As developers, we hope not to run into these situations too often.

Other assertion mistakes are less severe. We might accidentally write a vacuous assertion that gives us a false sense of confidence in the code. We might accidentally write an assertion that is too strict; it will spuriously fail at some point and need to be dialed back. As a rule of thumb, we don’t want to write assertions that are too obviously true:

if (b) {
  x = 3;
} else {
  x = 17;
}
assert (x==3 || x==17);

It is also not useful to gunk up a program with layers and layers of redundant assertions. This makes code slow and hard to read. The 1-in-70 number from LLVM is a reasonable target. Some codes will naturally want more assertions than that; some codes will not need nearly as many.

Ben Titzer’s comment illustrates some ways that assertions can be misused to break modularity and make code more confusing.

One danger of assertions is that they don’t compose well when their behavior is to unconditionally terminate a process. Tripping over assertions in buggy library code is extremely frustrating. On the other hand, it’s not immediately obvious that compiling a buggy library with NDEBUG is a big improvement since now incorrect results are likely to percolate into application code. In any case, if we are writing library code, we must be even more careful to use assertions correctly than if we are writing a standalone application. The only time to assert something in library code is when the code truly cannot continue executing without dire consequences.

Finally — and I already said this — assertions are not for error handling. Even so, I often write code asserting that calls to close() return 0 and calls to malloc() return a non-null pointer. I’m not proud of this but I feel that it’s better than the popular alternatives of (1) ignoring the fact that malloc() and close() can fail and (2) writing dodgy code that pretends to handle their failure. Anyhow, as a professor I’m allowed to write academic-quality code sometimes. In a high-quality code base it might still be OK to crash upon encountering an error that we don’t care to handle, but we’d want to do it using a separate mechanism. Jesse mentions that Mozilla has a MOZ_CRASH(message) construct that serves this purpose.

Misconceptions About Assertions

A few years ago, in a class where the assignments were being written in Java, I noticed that the students weren’t putting very many assertions in their code, so I asked them about it. Most of them sort of looked at me blankly but one student spoke up and said that he thought that exceptions were an adequate replacement for assertions. I said something like “exceptions are for error handling and assertions are for catching bugs” but really that wasn’t the best answer. The best answer would have been something like this:

Exceptions are a low-level control flow mechanism in the same category as method dispatch and switch statements. A common use for exceptions is to support structured error handling. However, exceptions could also be used to implement assertion failure. Both assertions and structured error handling are higher-level programming tasks that need to be mapped onto lower-level language features.

Another misconception (that I’ve never heard from a student, but have seen on the net) is that unit testing is a superior replacement for assertions. This would be true only in the case where our unit tests are perfect because they catch all bugs. In the real world neither unit tests nor assertions find all bugs, so we should use both. In fact, there is a strong synergy between assertions and unit tests, as I have tried to illustrate in a previous post. Here’s a blog post talking about how assertions interfere with unit testing. My opinion is that if this happens, you’re probably doing something wrong, such as writing tests that are too friendly with the implementation or else writing bogus assertions that don’t actually correspond to bug conditions. This page asks if unit tests or assertions are more important and here’s some additional discussion.

The checkRep() Idiom

I believe it is a good idea, when implementing any non-trivial data structure, to create a representation checker — often called checkRep() — that walks the entire structure and asserts everything it can. Alternatively, a method called repOK() can be implemented; it does not contain any assertions but rather returns a Boolean value indicating whether the data structure is in a consistent state. We would then write, for example:

assert (tree.repOK());

The idea is that while unit-testing the data structure, we can call checkRep() or assert repOK() after every operation in order to aggressively catch bugs in the data structure’s methods. For example, here’s a checkRep() that I implemented for a red-black tree:

void checkRep (rb_red_blk_tree *tree)
{
  /* root is black by convention */
  assert (!tree->root->left->red);
  checkRepHelper (tree->root->left, tree);
}

/* 
 * returns the number of black nodes along any path through 
 * this subtree 
 */
int checkRepHelper (rb_red_blk_node *node, rb_red_blk_tree *t)
{
  /* by convention sentinel nodes point to nil instead of null */
  assert (node);
  if (node == t->nil) return 0;
  
  /* the tree order must be respected */
  /* parents and children must point to each other */
  if (node->left != t->nil) {
    int tmp = t->Compare (node->key, node->left->key);
    assert (tmp==0 || tmp==1);
    assert (node->left->parent == node);
  }
  if (node->right != t->nil) { 
    int tmp = t->Compare (node->key, node->right->key);
    assert (tmp==0 || tmp==-1);
    assert (node->right->parent == node);
  }
  if (node->left != t->nil && node->right != t->nil) {
    int tmp = t->Compare (node->left->key, node->right->key);
    assert (tmp==0 || tmp==-1);
  }
  
  /* both children of a red node are black */
  if (node->red) {
    assert (!node->left->red);
    assert (!node->right->red);
  }

  /* every root->leaf path has the same number of black nodes */
  int left_black_cnt = checkRepHelper (node->left, t);
  int right_black_cnt = checkRepHelper (node->right, t);
  assert (left_black_cnt == right_black_cnt);
  return left_black_cnt + (node->red ? 0 : 1);
}

Hopefully this code makes some sense even if you’ve never implemented a red-black tree. It checks just about every invariant I could think of. The full code is here.

The UNREACHABLE() Idiom

Often, a switch or case statement is meant to be cover all possibilities and we don’t wish to fall into the default case. Here’s one way to deal with the problem:

switch (x) {
  case 1: foo(); break;
  case 2: bar(); break;
  case 3: baz(); break;
  default: assert (false);
}

Alternatively, some code bases use an UNREACHABLE() macro that is equivalent to assert(false). LLVM, for example, uses llvm_unreachable() about 2,500 times (however, their unreachable construct has a pretty strong semantics — in non-debug builds, it is turned into a compiler directive indicating dead code).

Light vs. Heavy Assertions

In many cases, assertions naturally divide into two categories. Light assertions execute in small constant time and tend to touch only metadata. Let’s take the example of a linked list that maintains a cached list length. A light assertion would ensure that the length is zero when the list pointer is null, and the length is non-zero when the list pointer is non-null. In contrast, a heavy assertion would check that the length is actually equal to the number of elements. Heavy assertions for a sort routine would ensure that the output is sorted and maybe also that no element of the array has been modified during the sort. A checkRep() almost always includes heavy assertions.

Generally speaking, heavy assertions are most useful during testing and probably have to be disabled in production builds. Light assertions might be enabled in deployed software. It is not uncommon for large software bases such as LLVM and Microsoft Windows to support both “checked” and “release” builds where the checked build — which is customarily not used outside of the organization that develops the software — executes heavy assertions and is substantially slower than the release build. The release build — if it does include the light assertions — is generally only a tiny bit slower than it would be if the assertions were omitted entirely.

Are Assertions Enabled in Production Code?

This is entirely situational. Let’s look at a few examples. Jesse passes along this example where a useful check in Mozilla was backed out due to a 3% performance hit. On the other hand Julian Seward said:

Valgrind is loaded with assertion checks and internal sanity checkers which periodically inspect critical data structures. These are permanently enabled. I don’t care if 5 percent or even 10 percent of the total run-time is spent in these checks—automated debugging is the way to go. As a result, Valgrind almost never segfaults—instead it emits some kind of a useful error message before dying. That’s something I’m rather proud of.

The Linux kernel generally wants to keep going no matter what, but even so it contains more than 11,000 uses of the BUG_ON() macro, which is basically an assertion — on failure it prints a message and then triggers a kernel panic without flushing dirty buffers. The idea, I assume, is that we’d rather lose some recently-produced data than risk flushing corrupted data to stable storage. Compilers such as GCC and LLVM ship with assertions enabled, making the compiler more likely to die and less likely to emit incorrect object code. On the other hand, I heard from a NASA flight software engineer that some of the tricky Mars landings have been done with assertions turned off because an assertion violation would have resulted in a system reboot and by the time the reboot had completed, the spacecraft would have hit the planet. The question of whether it is better to stop or keep going when an internal bug is detected is not a straightforward one to answer. I’d be interested to hear stories from readers about situations where assertions or lack of assertions in deployed software lead to interesting results.

Limitations of Assertions

Some classes of program bugs cannot be effectively detected using assertions. There’s no obvious way to assert the absence of race conditions or infinite loops. In C/C++, it is not possible to assert the validity of pointers, nor is it possible to assert that storage has been initialized. Since assertions live within the programming language, they cannot be used to express mathematical concepts such as quantifiers — useful, for example, because they support a concise specification of part of the postcondition for a sort routine:

   ∀ i, j : 0 ≤ i ≤ j < length ⇒ array[i] ≤ array[j]

The other half of the postcondition for a sort routine — which is often left out — requires that the output array is a permutation of the input array.

Some bugs can in principle be detected by assertions, but in practice are better detected other ways. One good example of this is undefined integer operations in C/C++ programs: asserting the precondition for every math operation would clutter up a program unacceptably. Compiler-inserted instrumentation is a much better solution. Assertions are best reserved for conditions that mean something to humans.

Assertions and Formal Methods

Recall my preferred definition of assertion: an expression at a program point that, if it ever evaluates to false, indicates a bug. Using predicate logic we can flip this definition around and see that if our program contains no bugs, then no assertion can fail. Thus, at least in principle, we should be able to prove that each assertion in our program cannot fail. Doing these proofs for non-trivial code is difficult, although it is far easier than proving an entire program to be correct. Also, we have tools that are specifically designed to help us reason about assertions; my favorite one for C code is Frama-C. If we write our assertions in ACSL (Frama-C’s annotation language) instead of in C, we get some advantages. First, ACSL is considerably more expressive than C; for example, it provides quantifiers, mathematical integers, and a way to assert that a pointer refers to valid storage. Second, if we can prove that our ACSL annotations are consistent with the source code, then we have a very strong result — we know that the assertion holds on all program paths, not just the ones that we were able to execute during testing. Additionally, as a backup plan, a subset of ACSL (called E-ACSL) can be translated into regular old assertions.

Conclusion

Assertions live at a sweet spot where testing, documentation, and formal methods all come together to help us increase the quality of a code base.

Many thanks to Robby Findler and Jesse Ruderman for providing feedback on drafts of this article.

UPDATES:

Writing Solid Code Week 4

This week most of the students in the class (but not everyone, since we ran out of time) presented their arguments about the quality of various UNIX utility programs. This was a bit more interesting than it might otherwise have been since some of the utilities running on our student lab machines are hilariously old versions. Students subjected the tools to a wide variety of stress tests such as fuzzing, building them with Clang’s UB sanitizer, Valgrind, Murphy (an awesome tool that I haven’t had time to try, but plan to), and more that I’m forgetting now. Additionally they inspected the source code, looked at how many bugs have been fixed in the utilities in recent years, etc. The point of this exercise was to get people used to the idea of assessing an externally-produced body of code using whatever methods are suitable and available.

The other thing we did was begin the next project. This one will have two phases but I’ve only told the students about the first one. With me playing the program manager we developed the specification. They will be developing this program in groups of three: a tester, a compression person, and an I/O person. I’m hoping that this code lends itself to writing assertions; if not, we’ll do something like balanced binary search trees later on, you can’t avoid assertions when writing those.

Automatically Entering the Grand C++ Error Explosion Competition

G++ can be comically verbose; developers sometimes like to wallpaper their cubes with choice error messages from Boost or STL programs. The Grand C++ Error Explosion Competition asks the question: how large can we make the ratio between error output and compiler input?

I’m not much of a C++ person but when the contest was announced I was doing some experiments in using C-Reduce as way to search for C++ programs that have interesting properties. Of course, we usually use C-Reduce to search for small programs, but Alex and I have been using it (and other reducers) to find, for example, programs that cause interesting parts of the compiler to execute. It only took a minute or two to setup C-Reduce so that its goal was to maximize the GCEEC’s fitness function. I started it running on four C++ files; after a few days three of the reductions didn’t show signs of terminating but the fourth one — some random part of the LLVM backend — reduced to this:

struct x0 struct A<x0(x0(x0(x0(x0(x0(x0(x0(x0(x0(_T1,x0 (_T1> <_T1*, x0(_T1*_T2> 
binary_function<_T1*, _T2, x0{ }

Somewhat surprisingly, there aren’t any templates here. When compiled using G++ 4.8.1 (I’m using the one that comes with Ubuntu 13.10 on x86-64) we get 5 MB of output. It wasn’t too hard to (1) clean up this output a bit and (2) recognize that the repeated (x0 substring is important. Thus, my entry to the GCEEC was:

struct x struct z<x(x(x(x(x(x(x(x(x(x(x(x(x(x(x(x(x(x(x(x(x(x(y,x(y><y*,x(y*w>v<y*,w,x{}

Every added (x approximately doubles the size of the error output. It was tricky to choose the right number of these substrings to include since I wanted to bump up against the timeout without pushing past it. But really, at this point the competition became a lot less interesting because we can pick a target ratio of output to input and trivially craft an input that reaches the target (assuming we don’t run into implementation limits). So the contest is basically a bandwidth contest where the question is: How many bytes can we get out of G++ on the specified platform within the 5 minute timeout? At this point the winner depends on how many cores are available, the throughput of Linux pipes, etc., which isn’t too satisfying.

I was a little bummed because I didn’t need to use a trick I had been saving up, which was to give the C++ file a name that is 255 characters long — this is useful because the name of the source file is repeated many times in the error output (and the length of the source file name is not part of the fitness function). However, it was delightful to read the other contest entries which used some nice tricks I wouldn’t have thought of.

Would it be fun to repeat this contest for Clang++ or MSVC++? Also, why is G++ so verbose? My guess is that its error reporting logic should be running (to whatever extent this is possible) much earlier in the compilation process, before templates and other things have been expanded. Also, it would probably be useful to enforce a limit on the length of any given error message printed by the compiler on the basis that nobody is interested in anything past the first 10 KB or whatever.

Writing Solid Code Week 3

This week we finished up the triangle classifier. Not all students’ implementations pass all test cases, which is totally fine with me as long as people learned some lessons about when not to use floating point (I did). On Tuesday we also chose a smallish UNIX utility for each student to evaluate as if it were going to be used as mission-critical software (not sure how our lives would depend on less but that’s beside the point). Next Tuesday we’ll read some students arguments and decide whether the evidence is convincing.

Thursday I lectured about assertions, one of my favorite topics. For several years I’ve been avoiding writing a blog post about assertions because there’s plenty of good information out there, but I really do need to write that post now that I’ve sort of collected all my thoughts in one place. The triangle classifier offered poor opportunities for writing assertions, but later on we’ll play with codes where assertions are basically mandatory. Anyway, I’ll try to get that written up soon.

Status of Software Testing

The other day I received a query from some software engineering researchers who are compiling sort of a survey paper about the status of software testing research; here are the questions, with my answers. I’m interested to hear what the rest of you think about this stuff.

What do you think are the most significant contributions to testing since 2000, whether from you or from other researchers?

These would be my choices:

  • delta debugging
  • symbolic and concolic testcase generation — Klee, SAGE, etc.
  • general-purpose open source execution monitoring tools — Valgrind, “clang -fsanitize=undefined”, etc.
  • incremental improvements in random testing — QuickCheck and its variants, etc.

What do you think are the biggest open challenges and opportunities for future research in this area?

Well, as far as I can tell, gaining confidence in the correctness and security of a piece of software is still both expensive and difficult. Not only that, but this process remains an art. We need to continue making it into a science. Just as a random example, if I want to build a compiler there are about 25 books I can read, all of which cover different aspects of a well-understood body of knowledge. They will tell me the various parts of a compiler and how I should put them together. If I want to become certain that a piece of software does what I hope it does, what books should I read? It’s not even clear. It’s not that there aren’t any good books about testing, but rather that even the good ones fail to contain actionable general-purpose recipes.

Of course not all software is testable. My work on testing GCC has convinced me of this (although I already knew it intellectually). I think there are lots of opportunities in learning how to create testable and/or verifiable software. For example, what could I accomplish if I had a really good concolic tester to help with unit testing? Hopefully, with a fairly low investment I’d end up with good test cases and also good contracts that would be a first step towards verification if I wanted to go in that direction. I think we can take a lesson from CompCert: Xavier didn’t just prove the thing correct, but rather pieced together translation validation and verification depending on which one was appropriate for a given task. Of course we can throw testing into the mix when those other techniques are too difficult. There’s a lot of synergy between these various ways of gaining confidence in software but at present verification and testing are largely separate activities (not least because the verification tools are so hard to use, but that’s a separate topic).

One of the biggest unsolved challenges is finding hidden functionality (whether deliberately or accidentally inserted) in software that operates across a trust boundary. Of course hidden functionality is difficult because the specification gives us few or no clues about where to find it; it’s all about the implementation. There’s no silver bullet for this problem but one idea I like (but haven’t worked on at all) is using continuous mathematics to model the system behavior and then focusing testing effort around discontinuities. This research is of the character that I’m trying to describe.