Open Proposals (or: Take My Idea — Please)

Sharing research papers on the web is not very controversial because sharing benefits everyone (other than a few increasingly irrelevant special interests). Sharing research proposals is a thornier proposition: since the work remains to be done, it exposes researchers to scooping. However, I would argue that scooping is not really very likely, and anyone whose ideas are good enough to steal is probably pretty successful anyway. In fact, if someone takes my idea and runs with it, then the work will get done without me having to lift a finger and I’ll be able to work on something else. Also, of course, a great research project (like a great product) is never really about the idea — it’s much more about great execution.

A grant proposal that I submit should be funded when the proposed work is important and when I’m the best person in the world to do it. This happens to correspond to the case where it’s unlikely for someone to successfully steal my idea. If I’m not the best person to do the work, then the proposal shouldn’t be funded and (assuming the idea is any good) someone else should work on it.

So far I’ve tried to argue that opening up research proposals has few drawbacks. But what are the advantages?

  • Proposers will get more and earlier feedback on their ideas.
  • Since researchers will be able to find out earlier what other people are working on, there will be more opportunities (and hopefully incentives) to collaborate.
  • Researchers will put stakes in the ground earlier than they otherwise would. I love this — duplicated effort is a depressing waste of time and money.
  • Everybody gets to see what researchers are asking for public funding to work on.

The other day I blogged a short proposal. Hopefully I’ll keep doing this, though it’s not clear I’ll always be able to convince my co-PIs that it’s a good idea. Personally, I’d be happy if NSF, DARPA, NIH, etc. would openly post every proposal they receive, as soon as they receive it. And maybe, just maybe, a few people will decide not to submit proposals that aren’t really ready for submission, lowering the massive workloads faced by program officers.

Isolation with a Very Small TCB

A basic service provided by most computer systems is isolation between different computations. Given reliable isolation we can safely run programs downloaded from the web, host competing companies’ sites on the same physical server, and perhaps even run your car’s navigation or entertainment software on the same processor that is used to control important system functions.

Other factors being equal, we’d like an isolation implementation to have a trusted computing base (TCB) that is as small as possible. For example, let’s look at the TCB for isolating Javascript programs from different sites when we browse the web using Firefox on a Linux box. It includes at least:

  • pretty much the entire Firefox executable including all of the libraries it statically or dynamically links against (even unrelated code is in the TCB due to the lack of internal isolation in a C/C++ program)
  • the Linux kernel including all kernel modules that are loaded
  • GCC and G++
  • some elements of the hardware platform (at least the processor, memory controller, memory modules, and any peripherals that use DMA)

A deliberate or accidental malfunction of any of these elements may subvert the isolation guarantee that Firefox would like to provide. This is a large TCB, though probably somewhat smaller than the corresponding TCB for a Windows machine. Microkernel-based systems potentially have a much smaller TCB, and a machine-verified microkernel like seL4 removes even the microkernel from the TCB. Even so, the seL4 TCB is still nontrivial, containing for example an optimizing compiler, the substantially complicated specification of the microkernel’s functionality, a model of the hardware including its MMU, and the hardware itself.

My student Lu Zhao’s PhD work is trying to answer the question: How small can we make the trusted
computing base for an isolation implementation? The target for this work is ARM-based microcontrollers that are large enough to perform multiple tasks, but way too small to run Linux (or any other OS that uses memory protection). Lu’s basic strategy is to implement software fault isolation (SFI). His tool rewrites an ARM executable in order to enforce two properties: memory safety and control flow integrity. We’re using a weak form of memory safety that forces all stores to go into a memory region dedicated to that task. Control flow integrity means that control never escapes from a pre-computed control flow graph. This boils down to rewriting the binary to put a dynamic check in front of every potentially dangerous operation. Using the excellent Diablo infrastructure, this was pretty easy to implement. I won’t go into the details but they can be found in a paper we just finished.

The trick to making Lu’s work have a small TCB was to remove the binary rewriter from the TCB by proving that its output has the desired properties. To do this, he needed to create mathematical descriptions of memory safety and control flow integrity, and also to borrow an existing mathematical description of the ARM ISA. Reusing the ARM model was ideal because it has been used by several previous projects and it is believed to be of very high quality. Automating the construction of a non-trivial, machine-checked proof is not so easy and Lu spent a lot of time engineering proof scripts. The saving grace here is that since we control where to add safety checks, the proof is in some sense an obvious one — the postcondition of the check provides exactly the property needed to establish the safety of the potentially-unsafe operation the comes after the check.

The TCB for Lu’s isolation solution contains the HOL4 theorem prover, his definitions of memory safety and control flow integrity (less than 60 lines of HOL), the model of the ARM ISA, and of course the ARM processor itself. All of these elements besides the processor are of quite manageable size. This work currently has two major drawbacks. First, the proofs are slow to construct and the automated proving part only succeeds for small executables. Second, overhead is far higher than it is for a state of the art sandboxing system such as Native Client. In principle the theorem proving infrastructure will be a powerful tool for optimizing sandboxed executables — but we haven’t gotten to that part of the work yet.

As far as we can tell, there’s not much more that can be removed from the TCB. Lu’s SFI implementation is the first one to do full formal verification of its guarantees for real executables. An important thing I learned from working with Lu on this project is that dealing with a mechanical theorem prover is a bit like having kids in the sense that you can’t approach it casually: you either go whole-hog or avoid it entirely. Perhaps in the future the theorem proving people will figure out how to make their tools accessible to dabblers like me.

Proposal for Automated Compiler Bug Reports

[Yesterday I submitted a proposal to Google for a modest amount of money to work on turning large random programs that expose compiler flaws into concise bug reports. Below is a transcription that is mostly faithful (citations are omitted and I changed the example bug report from a floating figure into inline text). Feedback is appreciated.]

Abstract

Csmith, our random testing tool for C compilers, has helped us discover and report nearly 400 previously unknown compiler bugs, mostly in GCC and LLVM. More than 90% of these bugs have now been fixed by compiler developers. Moreover, most of these bugs were in the “middle ends” of the compilers, meaning that they impact multiple programming languages (Go, C++, Ada, etc.) as well as many target architectures.

By proactively finding compiler bugs missed by existing test suites, we help compiler teams eliminate flaws that would otherwise make it into deployed tools. Though it is uncommon to encounter a serious compiler bug, when one does bite a development effort a lot of time is typically wasted. We help by performing nearly continuous testing of the GCC and LLVM development versions. After reporting hundreds of bugs, we appear to have reached a steady state where both compilers are substantially correct but every week or two a new bug is found.

The process of turning a bug-triggering compiler input into a good bug report is tedious and time consuming, while also requiring technical skill and good taste. We propose removing humans from this part of the picture by automating the construction of compiler bug reports.

Research Goals

The goal of the proposed work is to enable a workflow where someone installs our software on a fast machine and tells it how to build the latest compiler version and where to send bug reports.

The machine is now totally unattended. Each day it will build the latest compiler version and spend the rest of the day testing it. When a new bug is discovered, it will be reported. The bug reports will look like this:

Bug-triggering program:

extern int printf (const char *, ...);

static int *g_135 = &g_16[0];
static int *l_15 = &g_16[0];

int main (void) {
  g_16[0] = 1;
  *g_135 = 0;
  *g_135 = *l_15;
  printf("%d\n", g_16[0]);
}
  • Expected output — produced by GCC 3.4, GCC 4.1, Intel CC 11.0, Clang 2.8, and Microsoft VC9 — is “0”.
  • Using r156486 for i686-pc-linux-gnu, “gcc -O” produces “1”. The same version at -O0 produces the expected output.
  • First version of GCC containing this bug is r147613.
  • Probable fault location: the dead store elimination pass.

The manually generated report for this bug is here.

The problem we need to solve is taking a large chunk of random C code (empirically, the sweet spot for randomized compiler bug finding is around 80KB of code) and turning it into an actionable bug report. Our three years’ of experience reporting compiler bugs has given us a good working understanding of what compiler developers want and need in a bug report. The rest of this proposal explains how the different parts of this report will be produced.

The Most Important Problem: Test Case Minimization

The most important problem is test case minimization: turning a large pile of random C code into a small failure-triggering input like the one above. The ddmin Delta debugging algorithm is the best-known solution for test case minimization. In a nutshell, a Delta debugger is just a greedy search that works by iteratively removing a chunk of the input and checking if the new input still triggers the bug. Delta can be significantly optimized by progressively removing smaller and smaller chunks of the input and also by coordinating its choice of chunks with the structure of the test input. The best-known Delta implementation, by Wilkerson, incorporates both of these optimizations.

It works but has two serious problems when used to reduce the size of C programs. First, it gets stuck at local minima that are far from the (apparent) true minimum sizes for failure-inducing compiler inputs due to being line-based. Many interesting reduction operations for C programs need to be aware of its token-level and AST-level structure. Second, Wilkerson’s Delta usually creates reduced test cases that invoke undefined behavior. The GCC developers — to put it mildly — do not appreciate bug reports containing test cases that rely on undefined behavior.

We propose investigating two different approaches to fixing these problems. The first approach has two parts:

  • A C-aware Delta debugger that will significantly improve on the Wilkerson Delta by first performing local transformations such as reducing x+y to x or x to 0, and second by performing global simplifications such as removing an argument not only from a function definition, but also from all uses of the function, removing a level of indirection from a pointer, and performing inline substitution of simple function bodies.
  • We are working with Chucky Ellison at UIUC who is developing an executable semantics for C that robustly detects most kinds of undefined behavior (including several, such as unsequenced updates to an lvalue, that are largely undetected by other tools). His tool is a research prototype; we are helping refine it into a useful part of a Delta debugging loop.

Our second new approach to testcase reduction takes an entirely different approach: using Csmith itself to perform test case minimization. Our insight is that this will work because Csmith understands not only the structure of C, but also how to avoid introducing undefined behavior. Drawbacks of the Delta-in-Csmith approach are that it increases the complexity of an already complex tool and that it cannot be used to reduce failure-inducing C programs that did not come from Csmith in the first place. We believe that it is worth exploring both approaches.

Secondary Research Problems

Although our major focus will be on creating reportable test cases, we also plan to attack the following problems. First, making a robust guess at the first revision that contains a compiler bug. We have already tried using a naive binary search to find the first broken version and too often, the result of the search is a version that contains an incidental change that merely exposes the bug. By randomly exploring a variety of similar test cases and compiler options, we hope to robustly locate the first version that contains the bug.

Second, we will perform fault localization—try to guess where in the compiler the problem lies. When the problem lies in an optional part of the compiler (i.e., an optimization pass) this is fairly easy using Whalley’s work. When the bug lies in the frontend or backend, the situation is more difficult and is beyond the scope of this proposal.

Outcome

Evaluating this work is easy. The only important question is: Are our bug reports good enough that they motivate compiler developers to fix the bugs? We will answer this question by continuing to report bugs to major open source compiler projects like GCC and LLVM. Furthermore, our test case reduction and related tools will be released as open source as part of the already open-sourced Csmith.

Relation to Previous Work

We are already taking advantage of as much previous work as seems to be available. Mainly, this consists of Zeller’s Delta debugging papers and a handful of follow-up papers that have appeared over the last ten years, in addition to Wilkerson’s Delta implementation and Ellison’s executable C semantics. No previous compiler bug-finding effort (that we are aware of) has resulted in anything close to the nearly 400 bugs we have reported, and our guess is that for this reason nobody has yet had to develop good tool support for automated bug reports.