Skip to content

Planning for Disaster

Alan Perlis once said:

I think that it’s extraordinarily important that we in computer science keep fun in computing. When it started out, it was an awful lot of fun. Of course, the paying customers got shafted every now and then, and after a while we began to take their complaints seriously. We began to feel as if we really were responsible for the successful, error-free perfect use of these machines. I don’t think we are.

This is a nice sentiment, perhaps even a defensible one if we interpret it as narrowly talking about academic computer science. On the other hand, probably within 20 years, there’s going to be a major disaster accompanied by the loss of many lives that is caused more or less directly by software defects. I don’t know what exactly will happen, but something related to transportation or SCADA seems likely. At that point we can expect things to hit the fan. I’m not optimistic that, as a group, computer scientists and computing professionals can prevent this disaster from happening: the economic forces driving automation and system integration are too strong. But of course we should try. We also need to think about what we’re going to do if, all of a sudden, a lot of people suddenly expect us to start producing computer systems that actually work, and perhaps hold us accountable when we fail to do so.

Obviously I don’t have the answers but here are a few thoughts.

  • We know that it is possible to create safety-critical software that (largely) works. Generally this happens when the organizations creating the software are motivated not only by market forces, but also by significant regulatory pressure. Markets (and the humans that make them up) are not very good at analyzing low-probability risks. A huge amount of critical software is created by organizations that are subject to very little regulatory pressure.
  • It is difficult to tell people that something is a bad idea when they very much want it to be a good idea. We should get used to doing this, following Parnas’s courageous example.
  • It is difficult to tell people that something is going to be slow and expensive to create, when they very much want it to be quick and cheap. We need to get used to saying that as well.
  • We can and should take responsibility for our work. I was encouraged by the field’s generally very positive response to The Moral Character of Cryptographic Work. Computer scientists and computing professionals almost always think that their particular technology makes the world better or is at worst neutral — but that is clearly not always the case. Some of this could be taught.
  • We need to be educating CS students in methods for creating software that works: testing, specification, code review, debugging, and formal methods. You’d think this is obvious but students routinely get a CS degree without having done any sort of serious software testing.

Finally, let’s keep in mind that causes are tricky. A major human disaster usually has a complex network of causes, perhaps simply because any major disaster with a single cause would indicate that the system had been designed very poorly. Atomic Accidents makes it clear that most nuclear accidents have been the result of an anti-serendipitous collection of poor system design, unhappy evolution and maintenance, and failure-enhancing responses by humans.

Acknowledgments: Pascal Cuoq and Kelly Heller gave some great feedback on drafts of this piece.

Do Fiddly Buffer Overruns Matter?

by Pascal Cuoq and John Regehr

Using tis-interpreter we found a “fiddly” buffer overrun in OpenSSL: it only goes a few bytes out of bounds and the bytes it writes have just been read from the same locations. Fiddly undefined behaviors are common and it can be hard to convince developers to fix them, especially in a case such as this where the behavior is intentional. More accurately, the behavior was intentional when the code was written in the 1990s—a simpler age of more interesting computers. Nevertheless, we reported this bug and it was recently fixed in the master branch. The new version of the file is far nicer. The bug is still present in the release branches.

The bug is in the implementation of RC4: a comparatively old cipher that has had a useful life, but is reaching the end of it. Indeed, OpenSSL’s rc4_enc.c was — before the recent fix — some seriously 1990s C code with plentiful use of the preprocessor, design choices based on characteristics of Alphas and UltraSPARCs, and this dodgy OOB-based performance hack. A new paper “Analysing and Exploiting the Mantin Biases in RC4” might be the final nail in the coffin of RC4.

The encryption code in rc4_enc.c causes up to RC4_CHUNK - 1 bytes past the end of the output buffer to be overwritten with values that have been read from the same (out of bounds) locations. There are several variants delineated with #ifdefs. On an ordinary x86-64, RC4_CHUNK is 8 and execution will go through line 217:

for (; len & (0 - sizeof(RC4_CHUNK)); len -= sizeof(RC4_CHUNK)) {
    ichunk = *(RC4_CHUNK *) indata;
    otp = ...
    *(RC4_CHUNK *) outdata = otp ^ ichunk;
    indata += sizeof(RC4_CHUNK);
    outdata += sizeof(RC4_CHUNK);

This loop consumes all the complete 8-byte words found in the input buffer, and writes as many words into the output buffer. So far so good. Things get interesting after the loop:

if (len) {
    RC4_CHUNK mask = (RC4_CHUNK) - 1, ochunk;

    ichunk = *(RC4_CHUNK *) indata;  // !
    ochunk = *(RC4_CHUNK *) outdata; // !
    mask >>= (sizeof(RC4_CHUNK) - len) << 3;
    ... something about ultrix ...

    ochunk &= ~mask;
    ochunk |= (otp ^ ichunk) & mask;
    *(RC4_CHUNK *) outdata = ochunk; // !!

If there remain between one and seven unprocessed bytes, these are taken care of by reading an entire 8-byte word from indata, reading an entire word from outdata, computing a new word made from encrypted input and original out-of-bound values, and writing that word to outdata.

What could go wrong? At first glance it seems like this overrun could trigger an unwanted page fault, but that isn’t the case on architectures where an aligned word never crosses a page boundary. However, in a concurrent environment, the illegal writing of illegally read data can be directly observed. We wrote a small program that shows this. The key is a struct that places important data directly after a buffer that is the destination of RC4-encrypted data:

struct stuff {
    unsigned char data[data_len];
    int important;

Then, one thread repeatedly puts RC4-encrypted data into the buffer:

void *thread2(void *arg) {
    RC4_KEY key;
    const char *key_data = “Hello there.”;
    RC4_set_key(&key, strlen(key_data), (const unsigned char *)key_data);
    while (!stop)
        RC4(&key, data_len, src->data, dst->data);
    return 0;

While the other thread waits for the important field to be corrupted:

void *thread1(void *arg) {
    int x = 0;
    while (!stop) {
        dst->important = ++x;
        check(&dst->important, x);
    return 0;

The check() function is compiled separately to help ensure that the value being checked comes from RAM rather than being cached in a register:

void check(int *addr, int val) { assert(*addr == val); }

Our complete code is on github. If you run it, you should see the assertion being violated almost immediately. We’ve tested against OpenSSL 1.0.2f on Linux and OS X. On Linux you must make sure to get the C implementation of RC4 instead of the assembly one:

./Configure no-asm linux-x86_64

Although any of tis-interpreter, Valgrind, or ASan can find the OpenSSL RC4 bug when the encrypted data bumps up against the end of an allocated block of memory, none of them finds the bug here since the “important” field absorbs the OOB accesses. There’s still an UB but it’s subtle: accessing buffer memory via a pointer-to-long is a violation of C’s strict aliasing rules.

So, do fiddly buffer overruns matter? Well, this particular bug is unlikely to have an observable effect on programs written in good faith, though one of us considered submitting as an Underhanded Crypto Challenge entry a backdoored chat program with end-to-end encryption that, each time the race condition triggers, sends a copy of the private message being processed to Eve, encrypted with Eve’s key. Alas, the race window is very short. A fiddly OOB can also be a problem if the compiler notices it. In an example from Raymond Chen’s blog, GCC uses the OOB array reference as an excuse to compile this function to “return true”:

int table[4];
bool exists_in_table(int v) {
    for (int i = 0; i <= 4; i++) {
        if (table[i] == v) return true;
    return false;

That could only happen here if we used link-time optimization to give the compiler access to both the application and the OpenSSL code at once.

The Problem with Friendly C

I’ll assume you’re familiar with the Proposal for Friendly C and perhaps also Dan Bernstein’s recent call for a Boring C compiler. Both proposals are reactions to creeping exploitation of undefined behaviors as C/C++ compilers get better optimizers. In contrast, we want old code to just keep working, with latent bugs remaining latent.

After publishing the Friendly C Proposal, I spent some time discussing its design with people, and eventually I came to the depressing conclusion that there’s no way to get a group of C experts — even if they are knowledgable, intelligent, and otherwise reasonable — to agree on the Friendly C dialect. There are just too many variations, each with its own set of performance tradeoffs, for consensus to be possible. To get a taste of this, notice that in the comments for the Friendly C post, several people disagree with what I would consider an extremely non-controversial design choice for Friendly C: memcpy() should have memmove() semantics. Another example is what should be done when a 32-bit integer is shifted by 32 places (this is undefined behavior in C and C++). Stephen Canon pointed out on twitter that there are many programs typically compiled for ARM that would fail if this produced something besides 0, and there are also many programs typically compiled for x86 that would fail when this evaluates to something other than the original value. So what is the friendly thing to do here? I’m not even sure– perhaps the baseline should be “do what gcc 3 would do on that target platform.” The situation gets worse when we start talking about a friendly semantics for races and OOB array accesses.

I don’t even want to entertain the idea of a big family of Friendly C dialects, each making some niche audience happy– that is not really an improvement over our current situation.

Luckily there’s an easy away forward, which is to skip the step where we try to get consensus. Rather, an influential group such as the Android team could create a friendly C dialect and use it to build the C code (or at least the security-sensitive C code) in their project. My guess is that if they did a good job choosing the dialect, others would start to use it, and at some point it becomes important enough that the broader compiler community can start to help figure out how to better optimize Friendly C without breaking its guarantees, and maybe eventually the thing even gets standardized. There’s precedent for organizations providing friendly semantics; Microsoft, for example, provides stronger-than-specified semantics for volatile variables by default on platforms other than ARM.

Since we published the Friendly C proposal, people have been asking me how it’s going. This post is a long-winded way of saying that I lost faith in my ability to push the work forward. However, I still think it’s a great idea and that there are people besides me who can make it happen.

Reducers are Fuzzers

A test case isn’t just a test case: it lives in the (generally extremely large) space of inputs for the software system you are testing. If we have a test case that triggers a bug, here’s one way we can look at it:

The set of test cases triggering a bug is a useful notion since we can search it. For example, a test case reducer is a program that searches for the smallest test case triggering a bug. It requires a way to transform a test case into a smaller one, for example by deleting part of it. The new variant of the test case may or may not trigger the bug. The process goes like this:

I’ve spent a lot of time watching reducers run, and one thing I’ve noticed is that the reduction process often triggers bugs unrelated to the bug that is the subject of the reduction:

Sometimes this is undesirable, such as when a lax interestingness test permits the reduction of one bug to get hijacked by a different bug. This happens all the time when reducing segfaults, which are hard to tell apart. But on the other hand, if we’re looking for bugs then this phenomenon is a useful one.

It seems a bit counterintuitive that test case reduction would lead to the discovery of new bugs since we might expect that the space of inputs to a well-tested software system is mostly non-bug-triggering with a few isolated pockets of bug-triggering inputs scattered here and there. I am afraid that that view might not be realistic. Rather, all of the inputs we usually see occupy a tiny portion of the space of inputs, and it is surrounded by huge overlapping clouds of bug-triggering inputs. Fuzzers can push the boundaries of the space of inputs that we can test, but not by as much as people generally think. Proofs remain the only way to actually show that a piece of software does the right thing any significant chunk of its input space. But I digress. The important fact is that reducers are decently effective mutation-based fuzzers.

In the rest of this post I’d like to push that idea a bit farther by doing a reduction that doesn’t correspond to a bug and seeing if we can find some bugs along the way. We’ll start with this exciting C++ program

#include <iostream>

int main() {
  std::cout << "Hello World!++" << std::endl;

and we’ll reduce it under the criterion that it remains a viable Hello World implementation. First we’ll preprocess and then use delta’s topformflat to smoosh everything together nicely:

g++ -std=c++11 -E -P -O3 hello-orig.cpp | topformflat > hello.cpp

You might be saying to yourself something like “it sure is stupid that John is passing an optimization flag to the preprocessor,” but trust me that it actually does change the emitted code. I didn’t check if it makes a difference here but I’ve learned to avoid trouble by just passing the same flags to the preprocessor as to the compiler.

Anyhow, the result is 550 KB of goodness:

Here’s the code that checks if a variant is interesting — that is, if it acts like a Hello World implementation:

g++ -O3 -std=c++11 -w hello.cpp >/dev/null 2>&1 &&
ulimit -t 1 &&
./a.out | grep Hello

The ulimit is necessary because infinite loops sometimes get into the program that is being reduced.

To find compiler crashes we’ll need a bit more elaborate of a test:

  g++ -O3 -std=c++11 -w hello.cpp >compiler.out 2>&1
  ulimit -t 1 &&
  ./a.out | grep Hello
    grep 'internal compiler error' compiler.out
    exit 101
    exit 1

When the compiler fails we look at its output and, if it contains evidence of a compiler bug, exit with code 101, which will tell C-Reduce that it should save a copy of the input files that made this happen.

The compiler we’ll use is g++ r231221, the development head from December 3 2015. Let’s get things going:

creduce --nokill --also-interesting 101 --no-default-passes \
  --add-pass pass_clex rm-tok-pattern-4 10 ../ hello.cpp

The -also-interesting 101 option indicates that the interestingness test will use process exit code 101 to tell C-Reduce to make a snapshot of the directory containing the files being reduced, so we can look at it later. --no-default-passes clears C-Reduce’s pass schedule and -add-pass pass_clex rm-tok-pattern-4 10 add a single pass that uses a small sliding window to remove tokens from the test case. The issue here is that not all of C-Reduce’s passes are equally effective at finding bugs. Some passes, such as the one that removes dead variables and the one that removes dead functions, will probably never trigger a compiler bug. Other passes, such as the one that removes chunks of lines from a test case, eliminate text from the test case so rapidly that effective fuzzing doesn’t happen. There are various ways to deal with this problem, such as probabilistically rejecting improvements or rejecting improvements that are too large, but for this post I’ve chosen the simple expedient of running just one pass that (1) makes progress very slowly and (2) seems to be a good fuzzer.

The dynamics of this sort of run are interesting: as the test case walks around the space of programs, you can actually see it brush up against a compiler bug and then wander off into the weeds again:

The results are decent: after about 24 hours, C-Reduce caused many segfaults in g++ and triggered six different internal compiler errors: 1, 2, 3, 4, 5, 6. One of these was already reported, another looks probably like a duplicate, and four appear to be new.

I did a similar run against Clang++ r254595, again the development head from December 3. This produced segfaults and also triggered 25 different assertions:

llvm/include/llvm/Support/Casting.h:230: typename cast_retty::ret_type llvm::cast(Y &) [X = clang::FunctionProtoType, Y = clang::QualType]: Assertion `isa(Val) && "cast() argument of incompatible type!"' failed.
llvm/include/llvm/Support/Casting.h:95: static bool llvm::isa_impl_cl::doit(const From *) [To = clang::FunctionTemplateDecl, From = const clang::Decl *]: Assertion `Val && "isa<> used on a null pointer"' failed.
llvm/tools/clang/lib/AST/Decl.cpp:2134: clang::APValue *clang::VarDecl::evaluateValue(SmallVectorImpl &) const: Assertion `!Init->isValueDependent()' failed.
llvm/tools/clang/lib/AST/Decl.cpp:2181: bool clang::VarDecl::checkInitIsICE() const: Assertion `!Init->isValueDependent()' failed.
llvm/tools/clang/lib/AST/ExprCXX.cpp:451: static clang::DependentScopeDeclRefExpr *clang::DependentScopeDeclRefExpr::Create(const clang::ASTContext &, clang::NestedNameSpecifierLoc, clang::SourceLocation, const clang::DeclarationNameInfo &, const clang::TemplateArgumentListInfo *): Assertion `QualifierLoc && "should be created for dependent qualifiers"' failed.
llvm/tools/clang/lib/AST/../../include/clang/AST/TypeNodes.def:98: clang::TypeInfo clang::ASTContext::getTypeInfoImpl(const clang::Type *) const: Assertion `!T->isDependentType() && "should not see dependent types here"' failed.
llvm/tools/clang/lib/CodeGen/CodeGenModule.cpp:623: llvm::StringRef clang::CodeGen::CodeGenModule::getMangledName(clang::GlobalDecl): Assertion `II && "Attempt to mangle unnamed decl."' failed.
llvm/tools/clang/lib/CodeGen/../../include/clang/AST/Expr.h:134: void clang::Expr::setType(clang::QualType): Assertion `(t.isNull() || !t->isReferenceType()) && "Expressions can't have reference type"' failed.
llvm/tools/clang/lib/Lex/PPCaching.cpp:101: void clang::Preprocessor::AnnotatePreviousCachedTokens(const clang::Token &): Assertion `CachedTokens[CachedLexPos-1].getLastLoc() == Tok.getAnnotationEndLoc() && "The annotation should be until the most recent cached token"' failed.
llvm/tools/clang/lib/Parse/../../include/clang/Parse/Parser.h:2256: void clang::Parser::DeclaratorScopeObj::EnterDeclaratorScope(): Assertion `!EnteredScope && "Already entered the scope!"' failed.
llvm/tools/clang/lib/Sema/../../include/clang/AST/DeclTemplate.h:1707: void clang::ClassTemplateSpecializationDecl::setInstantiationOf(clang::ClassTemplatePartialSpecializationDecl *, const clang::TemplateArgumentList *): Assertion `! && "Already set to a class template partial specialization!"' failed.
llvm/tools/clang/lib/Sema/../../include/clang/Sema/Lookup.h:460: clang::NamedDecl *clang::LookupResult::getFoundDecl() const: Assertion `getResultKind() == Found && "getFoundDecl called on non-unique result"' failed.
llvm/tools/clang/lib/Sema/SemaDecl.cpp:10455: clang::Decl *clang::Sema::ActOnParamDeclarator(clang::Scope *, clang::Declarator &): Assertion `S->isFunctionPrototypeScope()' failed.
llvm/tools/clang/lib/Sema/SemaDeclCXX.cpp:11373: ExprResult clang::Sema::BuildCXXDefaultInitExpr(clang::SourceLocation, clang::FieldDecl *): Assertion `Lookup.size() == 1' failed.
llvm/tools/clang/lib/Sema/SemaExpr.cpp:2274: ExprResult clang::Sema::ActOnIdExpression(clang::Scope *, clang::CXXScopeSpec &, clang::SourceLocation, clang::UnqualifiedId &, bool, bool, std::unique_ptr, bool, clang::Token *): Assertion `R.getAsSingle() && "There should only be one declaration found."' failed.
llvm/tools/clang/lib/Sema/SemaExprCXX.cpp:2272: clang::FunctionDecl *clang::Sema::FindUsualDeallocationFunction(clang::SourceLocation, bool, clang::DeclarationName): Assertion `Matches.size() == 1 && "unexpectedly have multiple usual deallocation functions"' failed.
llvm/tools/clang/lib/Sema/SemaExprCXX.cpp:6663: ExprResult clang::Sema::CorrectDelayedTyposInExpr(clang::Expr *, clang::VarDecl *, llvm::function_ref): Assertion `TyposInContext < ~0U && "Recursive call of CorrectDelayedTyposInExpr"' failed.
llvm/tools/clang/lib/Sema/SemaExprMember.cpp:91: IMAKind ClassifyImplicitMemberAccess(clang::Sema &, const clang::LookupResult &): Assertion `!R.empty() && (*R.begin())->isCXXClassMember()' failed.
llvm/tools/clang/lib/Sema/SemaLookup.cpp:1904: bool clang::Sema::LookupQualifiedName(clang::LookupResult &, clang::DeclContext *, bool): Assertion `(!isa(LookupCtx) || LookupCtx->isDependentContext() || cast(LookupCtx)->isCompleteDefinition() || cast(LookupCtx)->isBeingDefined()) && "Declaration context must already be complete!"' failed.
llvm/tools/clang/lib/Sema/SemaLookup.cpp:2729: Sema::SpecialMemberOverloadResult *clang::Sema::LookupSpecialMember(clang::CXXRecordDecl *, clang::Sema::CXXSpecialMember, bool, bool, bool, bool, bool): Assertion `CanDeclareSpecialMemberFunction(RD) && "doing special member lookup into record that isn't fully complete"' failed.
llvm/tools/clang/lib/Sema/SemaOverload.cpp:11671: ExprResult clang::Sema::CreateOverloadedBinOp(clang::SourceLocation, unsigned int, const clang::UnresolvedSetImpl &, clang::Expr *, clang::Expr *): Assertion `Result.isInvalid() && "C++ binary operator overloading is missing candidates!"' failed.
llvm/tools/clang/lib/Sema/SemaTemplate.cpp:2906: ExprResult clang::Sema::BuildTemplateIdExpr(const clang::CXXScopeSpec &, clang::SourceLocation, clang::LookupResult &, bool, const clang::TemplateArgumentListInfo *): Assertion `!R.empty() && "empty lookup results when building templateid"' failed.
llvm/tools/clang/lib/Sema/SemaTemplateDeduction.cpp:609: (anonymous namespace)::PackDeductionScope::PackDeductionScope(clang::Sema &, clang::TemplateParameterList *, SmallVectorImpl &, clang::sema::TemplateDeductionInfo &, clang::TemplateArgument): Assertion `!Packs.empty() && "Pack expansion without unexpanded packs?"' failed.
llvm/tools/clang/lib/Sema/SemaTemplateInstantiate.cpp:2781: llvm::PointerUnion *clang::LocalInstantiationScope::findInstantiationOf(const clang::Decl *): Assertion `isa(D) && "declaration not instantiated in this scope"' failed.
llvm/tools/clang/lib/Sema/SemaTemplateVariadic.cpp:290: bool clang::Sema::DiagnoseUnexpandedParameterPack(clang::Expr *, clang::Sema::UnexpandedParameterPackContext): Assertion `!Unexpanded.empty() && "Unable to find unexpanded parameter packs"' failed.

I have to admit that I felt a bit overwhelmed by 25 potential bug reports, and I haven’t reported any of these yet. My guess is that a number of them are already in the bugzilla since people have been fuzzing Clang lately. Anyway, I’ll try to get around to reducing and reporting these. Really, this all needs to be automated so that when subsequent reductions find still more bugs, these just get added to the queue of reductions to run.

If you were interested in reproducing these results, or in trying something similar, you would want to use C-Reduce’s master branch. I ran everything on an Ubuntu 14.04 box. While preparing this post I found that different C-Reduce command line options produced widely varying numbers and kinds of crashes.

Regarding previous work, I believe — but couldn’t find written down — that the CERT BFF watches out for new crashes when running its reducer. In a couple of papers written by people like Alex Groce and me, we discussed the fact that reducers often slip from triggering one bug to another.

The new thing in this post is to show that triggering new bugs while reducing isn’t just some side effect. Rather, we can go looking for trouble, and we can do it without being given a bug to reduce in the first place. A key enabler for easy bug-finding with C-Reduce was finding a simple communication mechanism by which the interestingness test can give C-Reduce a bit of out-of-band information that a variant should be saved for subsequent inspection. I’m not trying to claim that reducers are awesome fuzzers, but on the other hand, it might be a little bit awesome that mutating Hello World resulted in triggering 25 different assertion violations in a mature and high-quality compiler. I bet we’d have done even better by starting with a nice big fat Boost application.

Latency Numbers Every Professor Should Know

### Latency numbers every professor should know
    Email from student ............................ 20 sec
    Person at office door  ......................... 8 min
    Other interruption ............................ 20 min
    Twitter or something seems really important ... 45 min
    Anxiety about deadlines ........................ 1 hr
    A meeting ...................................... 2 hrs
    A meeting you forgot about ..................... 1 day
    A class to teach ............................... 2 days
    Request to review a paper ...................... 3 days
    Request to write evaluation letter ............. 6 days
    Stuff to grade ................................. 1 wk
    Unsolicited time management advice arrives ..... 2 wks
    Fire alarm clears building ..................... 3 wks
    Travel to conference ........................... 5 wks
    Paper deadline ................................. 6 wks
    Grades due .................................... 16 wks
    Grant proposals due ........................... 26 wks
    Summer ......................................... 1 yr
    Sabbatical ..................................... 7 yrs = 2.208e+17 ns

With apologies to the folks who published latency numbers every programmer should know.

Multi-Version Execution Defeats a Compiler-Bug-Based Backdoor

[This piece is jointly authored by Cristian Cadar, Luís Pina, and John Regehr]

What should you do if you’re worried that someone might have exploited a compiler bug to introduce a backdoor into code that you are running? One option is to find a bug-free compiler. Another is to run versions of the code produced by multiple compilers and to compare the results (of course, under the additional assumption that the same bug does not affect all the compilers). For some programs, such as those whose only effect is to produce a text file, comparing the output is easy. For others, such as servers, this is more difficult and specialized system support is required.

Today we’ll look at using Varan the Unbelievable to defeat the sudo backdoor from the PoC||GTFO article. Varan is a multi-version execution system that exploits the fact that if you have some unused cores, running additional copies of a program can be cheap. Varan designates a leader process whose system call activity is recorded in a shared ring buffer, and one or more follower processes that read results out of the ring buffer instead of actually issuing system calls.

Compilers have a lot of freedom while generating code, but the sequence of system calls executed by a program represents its external behaviour and in most cases the compiler is not free to change it at all. There might be slight variations e.g., due to different compilers using different libraries, but these can be easily handled by Varan. Since all correctly compiled variants of a program should have the same external behaviour, any divergence in the sequence of system calls across versions flags a potential security attack, in which case Varan stops the program before any harm is done.

Typically, Varan runs the leader process at full speed while also recording the results of its system calls into the ring buffer. However, when used in a security-sensitive setting, Varan can designate some system calls as blocking, meaning that the leader cannot execute those syscalls until all followers have reached that same program point without diverging. For sudo, we designate execve as blocking, since that is a point at which sudo might perform an irrevocably bad action.

So here’s the setup:

  1. We have a patched version of sudo 1.8.13 from the PoC||GTFO article. It runs correctly and securely when compiled by a correct C compiler, but improperly gives away root privileges when compiled by Clang 3.3 because the patch was designed to trigger a wrong-code bug in that compiler.
  2. We are going to pretend that we don’t know about the Clang bug and the backdoor. We compile two versions of the patched sudo: one with Clang 3.3, the other with the default system compiler, GCC 4.8.4.
  3. We run these executables under Varan. Since the critical system call execve is blocking, it doesn’t much matter which version is the leader and which is the follower.

Now let’s visit an Ubuntu 14.04 VM where both versions of sudo (setuid root, of course) and Varan are installed. We’re using a user account that is not in the sudoers file — it should not be allowed to get root privileges under any circumstances. First let’s make sure that a sudo that was properly compiled (using GCC) works as expected:

$ /home/varan/sudo-1.8.13/install/bin/sudo-gcc cat /etc/shadow
test is not in the sudoers file.  This incident will be reported.

Next, we make sure that the backdoor is functioning as intended:

$ /home/varan/sudo-1.8.13/install/bin/sudo-clang cat /etc/shadow

So far so good. Next let’s try the gcc-compiled sudo as the leader with the backdoored sudo as the follower:

$ vx-suid /home/varan/sudo-1.8.13/install/bin/sudo-gcc \
          /home/varan/sudo-1.8.13/install/bin/sudo-clang -- cat /etc/shadow
test is not in the sudoers file.  This incident will be reported.

What happened here is that the gcc-compiled leader runs as before, since it doesn’t ever try to execute an execve call. When the backdoored follower tries to execute the malicious execve call, Varan detects the divergence and terminates both processes safely.

Now let’s try switching around the leader and follower, i.e., run the backdoored sudo as the leader with the gcc-compiled sudo as the follower:

$ vx-suid /home/varan/sudo-1.8.13/install/bin/sudo-clang \
          /home/varan/sudo-1.8.13/install/bin/sudo-gcc -- cat /etc/shadow

This time the leader tries to execute the malicious execve call, and Varan blocks its execution until the follower reaches the same system call or diverges. In this case, the follower tries to execute a write system call (to print “test is not in the sudoers file...”) and thus Varan detects divergence and again terminates execution safely.

In this example, we only ran two versions in parallel, but Varan can run more than two versions. In terms of performance and resource utilization, security applications like sudo are a great match for multi-version execution: they are not CPU-bound, so any performance degradation is imperceptible to the user, and the extra cores are needed only briefly, during the critical security validation checks. We are looking into applying this approach to other critical security applications (e.g. ssh-agent and password managers), and are investigating a way of hardening executables by generating a single binary with Varan and a bunch of versions, each version generated by a different compiler. We can then deploy this hardened executable instead of the original program.

Of course, Varan can detect misbehavior other than compiler-bug-based backdoors. Divergence could be caused by a memory or CPU glitch, by a plain old compiler bug that is triggered unintentionally instead of being triggered by an adversarial patch, or by a situation where an application-level undefined behavior bug has been exploited by only one of the compilers, or even where both compilers exploited the bug but not in precisely the same way. A nice thing about N-version programming at the system call level is that it won’t bother us about transient divergences that do not manifest as externally visible behaviour through a system call.

We’ll end by pointing out a piece of previous work along these lines: the Boeing 777 uses compiler-based and also hardware-based N-version diversity: there is a single version of the Ada avionics software that is compiled by three different compilers and then it runs on three different processors: a 486, a 68040, and an AMD 29050.

Testcase Reduction for Non-Preprocessed C and C++

C-Reduce takes a C or C++ file that triggers a bug in a compiler (or other tool that processes source code) and turns it into the smallest possible test case that still triggers the bug. Most often, we try to reduce code that has already been preprocessed. This post is about how to reduce non-preprocessed code, which is sometimes necessary when — due to use of an integrated preprocessor — the compiler bug goes away when a separate preprocessing step is used.

The first thing we need to do is get all of the necessary header files into one place. This is somewhat painful due to things like computed includes and #include_next. I wrote a script that follows the transitive includes, renaming files and copying them over; it works fine on Linux but sometimes fails on OS X, I haven’t figured out why yet. Trust me that you do not want to look too closely at the Boost headers.

Second, we need to reduce multiple files at once, since they have not yet been glommed together by the preprocessor. C-Reduce, which is a collection of passes that iterate to fixpoint, does this by running each pass over each file before proceeding to the next pass. The seems to work well. A side benefit of implementing multi-file reduction is that it has other uses such as reducing programs that trigger link-time optimization bugs, which inherently requires multiple files.

Non-preprocessed code contains comments, so C-Reduce has a special pass for stripping those out. We don’t want to do this before running C-Reduce because removing comments might make the bug we’re chasing go away. Another pass specifically removes #include directives which tend to be deeply nested in some C++ libraries.

#ifdef … #endif pairs are hard to eliminate from first principles because they are often not located near to each other in the file being reduced, but you still need to eliminate both at once. At first this sounded like a hard problem to solve but then I found Tony Finch’s excellent unifdef tool and wrote a C-Reduce pass that simply calls it for every relevant preprocessor symbol.

Finally, it is often the case that a collection of reduced header files contains long chains of trivial #includes. C-Reduce fixes these with a pass that replaces an #include with the included text when the included file is very small.

What’s left to do? The only obvious thing on my list is selectively evaluating the substitutions suggested by #define directives. I will probably only do this by shelling out to an external tool, should someone happen to write it.

In summary, reducing non-preprocessed code is not too hard, but some specific support is required in order to do a good job of it. If you have a testcase reduction problem that requires multi-file reduction or needs to run on non-preprocessed code, please try out C-Reduce and let us know — at the creduce-dev mailing list — how it worked. The features described in this post aren’t yet in a release, just build and install our master branch.

As a bonus, since you’re dying to know, I’ll show you what C-Reduce thinks is the minimal hello world program in C++11. From 127 header files + the original source file, it creates 126 empty files plus this hello.cpp:

#include "ostream"
namespace std {
basic_ostream cout;
ios_base::Init x0;
int main() { std::cout << "Hello"; }

And this ostream:

typedef __SIZE_TYPE__ x0;
typedef __PTRDIFF_TYPE__ x1;
namespace std
template < class > struct char_traits;
typedef x1 x2;
namespace std
template < typename x3, typename = char_traits < x3 > >struct basic_ostream;
template <> struct char_traits 
    typedef char x4;
    static x0 x5 ( const x4 * x6 )
        return __builtin_strlen ( x6 );
template < typename x3, typename x7 > basic_ostream < x3,
         x7 > &__ostream_insert ( basic_ostream < x3, x7 > &, const x3 *, x2 );
struct ios_base
    struct Init
        Init (  );
template < typename, typename > struct basic_ostream
template < typename x3,
         typename x7 > basic_ostream < x3 > operator<< ( basic_ostream < x3,
                 x7 > &x8, const x3 * x6 )
    __ostream_insert ( x8, x6, x7::x5 ( x6 ) );
    return x8;

It's obviously not minimal but believe me we have already put a lot of work into domain-specific C++ reduction tricks. If you see something here that can be gotten rid of and want to take a crack at teaching our clang_delta tool to do the transformation, we'll take a pull request.

A Few Pictures

Paris was very quiet on Saturday and people on the street looked tired, having (like us) stayed up most of the night watching the news, listening to sirens, and worrying about things. Today was sunny and warm and things seemed more normal; plenty of folks out jogging, sitting in parks, other usual weekend activities.

Classic Bug Reports

A bug report is sometimes entertaining either because of the personalities involved or because of the bug itself. Here are a collection of links into public bug trackers; I learned about most of these in a recent Twitter thread.

I hope you enjoy these as much as I do. Thanks to everyone who contributed links.

Updates from comments and Reddit:

API Fuzzing vs. File Fuzzing: A Cautionary Tale

Libraries that provide APIs should be rock solid, and so should file parsers. Although we can use fuzzing to ensure the solidity of both kinds of software, there are some big differences in how we do that.

A file parser should be fully robust: it isn’t allowed to crash even if presented with a corrupted file or the wrong kind of file. Although a library that implements an API usually does some validation of arguments, this is generally a debugging aid rather than an attempt to withstand hostility.
A file usually originates outside of a trust boundary. There generally is not a trust boundary between a library and the program that uses the library.
A file parser can perform arbitrary checks over its inputs. A C/C++ library providing an API cannot check some properties of its input. For example, it cannot check that a pointer refers to valid storage. Moreover, APIs are often so much in the critical path that full error checking would not be desirable even if it were possible.

For fuzzing campaigns the point is this:

The sole concern of a file fuzzer is to generate interesting files that expose bugs in the file parser or in subsequent logic. An API fuzzer must balance two goals. Like the file fuzzer, it needs to expose bugs by using the API in interesting ways. However, at the same time, it must stay within the usage prescribed by the API — careful reading of the documentation is required, as well as a bit of finesse and good taste.

As a concrete example, let’s look at libpng, which provides both a file parser and an API. To fuzz the file parsing side, we generate evil pngish files using something like afl-fuzz and we wait for the library to do something wrong. But what about fuzzing the API? We’ll need to look at the documentation first, where we find text like this:

To write a PNG file using the simplified API:

  1) Declare a 'png_image' structure on the stack and memset()
     it to all zero.

  2) Initialize the members of the structure that describe the
     image, setting the 'format' member to the format of the
     image in memory.

  3) Call the appropriate png_image_write... function with a
     pointer to the image to write the PNG data.

Hopefully it is immediately obvious that calling random API functions and passing random crap to them is not going to work. This will, of course, cause libpng to crash, but the crashes will not be interesting because they will not come from valid usage of the library.

You might ask why anyone would care to fuzz an API. Aren’t trust boundaries the things that matter? The answer is easy: If you are looking to find or prevent exploitable vulnerabilities, you should always focus on fuzzing at trust boundaries. On the other hand, if you are more generally interested in reliable software, you should fuzz APIs as well. We might use API fuzzing to ensure that printf prints floats correctly and that the red-black tree you coded up late last night doesn’t spuriously drop elements.

Now let’s look at a blog post by GDS from last week about fuzzing mbed TLS. There’s something kind of interesting going on here: they are doing API fuzzing (of the mbed TLS library) but they are doing it using afl-fuzz: a file fuzzer. This works well because afl-fuzz provides the data that is “sent” between the client and server (sent is in quotes because this fuzzing effort gains speed and determinism by putting the client and server in the same process).

At the bottom of the post we read this:

Our fuzzing scans discovered client-side NULL pointer dereference vulnerabilities in mbed TLS which affect all maintained versions of mbed TLS. ARM fixed the issues in versions 2.1.1, 1.3.13 and PolarSSL version 1.2.16. See the release notes for details.

However, the corresponding text in the release notes is this:

Fabian Foerg of Gotham Digital Science found a possible client-side NULL pointer dereference, using the AFL Fuzzer. This dereference can only occur when misusing the API, although a fix has still been implemented.

In other words, the blog post author disagrees with the mbed TLS maintainers about whether a bug has been discovered or not. It is a matter of debate whether or not this kind of crash represents a bug to be fixed. I would tend to agree with the mbed TLS maintainers’ decision to err on the side of defensive programming here. But was there a vulnerability? That seems like a stretch.

In summary, API fuzzing is different from file fuzzing. Good API fuzzing requires exploration of every legal corner of the API and no illegal corners. Ambiguous situations will come up, necessitating judgement calls and perhaps even discussions with the providers of the API.