Multi-Version Execution Defeats a Compiler-Bug-Based Backdoor

[This piece is jointly authored by Cristian Cadar, Luís Pina, and John Regehr]

What should you do if you’re worried that someone might have exploited a compiler bug to introduce a backdoor into code that you are running? One option is to find a bug-free compiler. Another is to run versions of the code produced by multiple compilers and to compare the results (of course, under the additional assumption that the same bug does not affect all the compilers). For some programs, such as those whose only effect is to produce a text file, comparing the output is easy. For others, such as servers, this is more difficult and specialized system support is required.

Today we’ll look at using Varan the Unbelievable to defeat the sudo backdoor from the PoC||GTFO article. Varan is a multi-version execution system that exploits the fact that if you have some unused cores, running additional copies of a program can be cheap. Varan designates a leader process whose system call activity is recorded in a shared ring buffer, and one or more follower processes that read results out of the ring buffer instead of actually issuing system calls.

Compilers have a lot of freedom while generating code, but the sequence of system calls executed by a program represents its external behaviour and in most cases the compiler is not free to change it at all. There might be slight variations e.g., due to different compilers using different libraries, but these can be easily handled by Varan. Since all correctly compiled variants of a program should have the same external behaviour, any divergence in the sequence of system calls across versions flags a potential security attack, in which case Varan stops the program before any harm is done.

Typically, Varan runs the leader process at full speed while also recording the results of its system calls into the ring buffer. However, when used in a security-sensitive setting, Varan can designate some system calls as blocking, meaning that the leader cannot execute those syscalls until all followers have reached that same program point without diverging. For sudo, we designate execve as blocking, since that is a point at which sudo might perform an irrevocably bad action.

So here’s the setup:

  1. We have a patched version of sudo 1.8.13 from the PoC||GTFO article. It runs correctly and securely when compiled by a correct C compiler, but improperly gives away root privileges when compiled by Clang 3.3 because the patch was designed to trigger a wrong-code bug in that compiler.
  2. We are going to pretend that we don’t know about the Clang bug and the backdoor. We compile two versions of the patched sudo: one with Clang 3.3, the other with the default system compiler, GCC 4.8.4.
  3. We run these executables under Varan. Since the critical system call execve is blocking, it doesn’t much matter which version is the leader and which is the follower.

Now let’s visit an Ubuntu 14.04 VM where both versions of sudo (setuid root, of course) and Varan are installed. We’re using a user account that is not in the sudoers file — it should not be allowed to get root privileges under any circumstances. First let’s make sure that a sudo that was properly compiled (using GCC) works as expected:

$ /home/varan/sudo-1.8.13/install/bin/sudo-gcc cat /etc/shadow
test is not in the sudoers file.  This incident will be reported.

Next, we make sure that the backdoor is functioning as intended:

$ /home/varan/sudo-1.8.13/install/bin/sudo-clang cat /etc/shadow

So far so good. Next let’s try the gcc-compiled sudo as the leader with the backdoored sudo as the follower:

$ vx-suid /home/varan/sudo-1.8.13/install/bin/sudo-gcc \
          /home/varan/sudo-1.8.13/install/bin/sudo-clang -- cat /etc/shadow
test is not in the sudoers file.  This incident will be reported.

What happened here is that the gcc-compiled leader runs as before, since it doesn’t ever try to execute an execve call. When the backdoored follower tries to execute the malicious execve call, Varan detects the divergence and terminates both processes safely.

Now let’s try switching around the leader and follower, i.e., run the backdoored sudo as the leader with the gcc-compiled sudo as the follower:

$ vx-suid /home/varan/sudo-1.8.13/install/bin/sudo-clang \
          /home/varan/sudo-1.8.13/install/bin/sudo-gcc -- cat /etc/shadow

This time the leader tries to execute the malicious execve call, and Varan blocks its execution until the follower reaches the same system call or diverges. In this case, the follower tries to execute a write system call (to print “test is not in the sudoers file...”) and thus Varan detects divergence and again terminates execution safely.

In this example, we only ran two versions in parallel, but Varan can run more than two versions. In terms of performance and resource utilization, security applications like sudo are a great match for multi-version execution: they are not CPU-bound, so any performance degradation is imperceptible to the user, and the extra cores are needed only briefly, during the critical security validation checks. We are looking into applying this approach to other critical security applications (e.g. ssh-agent and password managers), and are investigating a way of hardening executables by generating a single binary with Varan and a bunch of versions, each version generated by a different compiler. We can then deploy this hardened executable instead of the original program.

Of course, Varan can detect misbehavior other than compiler-bug-based backdoors. Divergence could be caused by a memory or CPU glitch, by a plain old compiler bug that is triggered unintentionally instead of being triggered by an adversarial patch, or by a situation where an application-level undefined behavior bug has been exploited by only one of the compilers, or even where both compilers exploited the bug but not in precisely the same way. A nice thing about N-version programming at the system call level is that it won’t bother us about transient divergences that do not manifest as externally visible behaviour through a system call.

We’ll end by pointing out a piece of previous work along these lines: the Boeing 777 uses compiler-based and also hardware-based N-version diversity: there is a single version of the Ada avionics software that is compiled by three different compilers and then it runs on three different processors: a 486, a 68040, and an AMD 29050.

Testcase Reduction for Non-Preprocessed C and C++

C-Reduce takes a C or C++ file that triggers a bug in a compiler (or other tool that processes source code) and turns it into the smallest possible test case that still triggers the bug. Most often, we try to reduce code that has already been preprocessed. This post is about how to reduce non-preprocessed code, which is sometimes necessary when — due to use of an integrated preprocessor — the compiler bug goes away when a separate preprocessing step is used.

The first thing we need to do is get all of the necessary header files into one place. This is somewhat painful due to things like computed includes and #include_next. I wrote a script that follows the transitive includes, renaming files and copying them over; it works fine on Linux but sometimes fails on OS X, I haven’t figured out why yet. Trust me that you do not want to look too closely at the Boost headers.

Second, we need to reduce multiple files at once, since they have not yet been glommed together by the preprocessor. C-Reduce, which is a collection of passes that iterate to fixpoint, does this by running each pass over each file before proceeding to the next pass. The seems to work well. A side benefit of implementing multi-file reduction is that it has other uses such as reducing programs that trigger link-time optimization bugs, which inherently requires multiple files.

Non-preprocessed code contains comments, so C-Reduce has a special pass for stripping those out. We don’t want to do this before running C-Reduce because removing comments might make the bug we’re chasing go away. Another pass specifically removes #include directives which tend to be deeply nested in some C++ libraries.

#ifdef … #endif pairs are hard to eliminate from first principles because they are often not located near to each other in the file being reduced, but you still need to eliminate both at once. At first this sounded like a hard problem to solve but then I found Tony Finch’s excellent unifdef tool and wrote a C-Reduce pass that simply calls it for every relevant preprocessor symbol.

Finally, it is often the case that a collection of reduced header files contains long chains of trivial #includes. C-Reduce fixes these with a pass that replaces an #include with the included text when the included file is very small.

What’s left to do? The only obvious thing on my list is selectively evaluating the substitutions suggested by #define directives. I will probably only do this by shelling out to an external tool, should someone happen to write it.

In summary, reducing non-preprocessed code is not too hard, but some specific support is required in order to do a good job of it. If you have a testcase reduction problem that requires multi-file reduction or needs to run on non-preprocessed code, please try out C-Reduce and let us know — at the creduce-dev mailing list — how it worked. The features described in this post aren’t yet in a release, just build and install our master branch.

As a bonus, since you’re dying to know, I’ll show you what C-Reduce thinks is the minimal hello world program in C++11. From 127 header files + the original source file, it creates 126 empty files plus this hello.cpp:

#include "ostream"
namespace std {
basic_ostream cout;
ios_base::Init x0;
int main() { std::cout << "Hello"; }

And this ostream:

typedef __SIZE_TYPE__ x0;
typedef __PTRDIFF_TYPE__ x1;
namespace std
template < class > struct char_traits;
typedef x1 x2;
namespace std
template < typename x3, typename = char_traits < x3 > >struct basic_ostream;
template <> struct char_traits 
    typedef char x4;
    static x0 x5 ( const x4 * x6 )
        return __builtin_strlen ( x6 );
template < typename x3, typename x7 > basic_ostream < x3,
         x7 > &__ostream_insert ( basic_ostream < x3, x7 > &, const x3 *, x2 );
struct ios_base
    struct Init
        Init (  );
template < typename, typename > struct basic_ostream
template < typename x3,
         typename x7 > basic_ostream < x3 > operator<< ( basic_ostream < x3,
                 x7 > &x8, const x3 * x6 )
    __ostream_insert ( x8, x6, x7::x5 ( x6 ) );
    return x8;

It's obviously not minimal but believe me we have already put a lot of work into domain-specific C++ reduction tricks. If you see something here that can be gotten rid of and want to take a crack at teaching our clang_delta tool to do the transformation, we'll take a pull request.