Hello Android

Some semesters I teach courses that just need to be taught. On the other hand, this Fall I get to teach a class that should be purely fun — an Android projects course for upper-level undergrads. I already promised an “A” to anyone who (legitimately) makes at least $100 using an application developed for the class. I plan to nudge students away from web applications and single-node games, which seem boring. Rather, I hope they’ll interface with the awesome peripherals that make phones and tablets special. Today in class we looked through the source code for a pedometer app — the most interesting part being the step detection logic. It turned out to be pretty hacky, but it more or less works.

Here’s a gratuitous hardware shot — my pile of Galaxy 10.1 tabs, which the students got to take home today. Half of the tabs were gifts, half were purchased using university funds acquired by Thomas Schmid who was awesome enough to loan them to me (or rather, to my students) for the Fall. The most interesting part of the class will be when I get some Tegra 3 development boards — these should be seriously awesome.

A Fire Upon The Deep — Retrospective and E-book

Over the last few weeks I read A Fire Upon The Deep, surely one of the top five works of computer science fiction. The proximate reason for the re-read was the upcoming release of a sequel, Children of the Sky, which I am impatiently awaiting.

I read the “special edition” which contains about 1500 of the author’s annotations. This was a bit of a mixed bag. First, there are so many notes that it became tiresome to flip (electronically) back and forth between them and the main text. Second, a large fraction of the annotations are irrelevant because they apply to obsolete drafts, they contain formatting information for the publisher, or they are just too terse or arcane to figure out. Since Vinge went through the trouble of coding his annotations so they would be greppable, it would have been great if the special edition had exploited these tags — for example, by giving me the option of ignoring notes about typesetting. Around 10% of the annotations contain interesting material such as background information or deleted text; these show that Vinge worked very hard to make the story consistent and physically plausible, and that he was already thinking about the sequel 20 years ago.

One of the fun conceits in A Fire is that galaxy-scale bandwidth and latency constraints force most communication to look and feel just like Usenet did around 1990. While some of Vinge’s Usenet-centric humor (in particular the netnews headers) has become stale, much of it works if translated into terms of today’s message boards. In particular, the “net of a million lies” aspect is a lot more relevant today than it was a few decades ago.

Vinge is probably best known for coining the term technological singularity based on the idea that as a society’s technical prowess increases, progress becomes ever-faster until so much change occurs in such a short time that meaningful predictions about the end state are impossible. This notion does not play a role in A Fire. I’d argue that Vinge’s vision of the future of computer security would be a more appropriate lasting legacy than his thoughts about the singularity. This thread is present in most of his work, but in A Fire it is played up at a very large scale, with entire civilizations being wiped out or worse due to inappropriate network security measures. I shouldn’t need to add that we’re a lot closer to this situation now than we were when the book was written. This sequence stuck in my head even the first time I read the book:

The new Power had no weapons on the ground, nothing but a comm laser. That could not even melt steel at the frigate’s range. No matter, the laser was aimed, tuned civilly on the retreating warship’s receiver. No acknowledgment. The humans knew what communication would bring. The laser light flickered here and there across the hull, lighting smoothness and inactive sensors, sliding across the ship’s ultradrive spines. Searching, probing. The Power had never bothered to sabotage the external hull, but that was no problem. Even this crude machine had thousands of robot sensors scattered across its surface, reporting status and danger, driving utility programs. Most were shut down now, the ship fleeing nearly blind. They thought by not looking that they could be safe.

One more second and the frigate would attain interstellar safety.

The laser flickered on a failure sensor, a sensor that reported critical changes in one of the ultradrive spines. Its interrupts could not be ignored if the star jump were to succeed. Interrupt honored. Interrupt handler running, looking out, receiving more light from the laser far below…. a backdoor into the ship’s code, installed when the newborn had subverted the humans’ groundside equipment….

…. and the Power was aboard, with milliseconds to spare. Its agents — not even human equivalent on this primitive hardware — raced through the ship’s automation, shutting down, aborting. There would be no jump. Cameras in the ship’s bridge showed widening of eyes, the beginning of a scream. The humans knew, to the extent that horror can live in a fraction of a second.

There would be no jump. Yet the ultradrive was already committed. There would be a jump attempt, without automatic control a doomed one. Less than five milliseconds till the jump discharge, a mechanical cascade that no software could finesse. The newborn’s agents flitted everywhere across the ship’s computers, futilely attempting a shutdown. Nearly a light-second away, under the gray rubble at the High Lab, the Power could only watch. So. The frigate would be destroyed.

Apparently all bets are off when Satan is putting back doors in your critical control code.

Aside from the self-imposed problem of looking at every annotation, reading A Fire Upon the Deep on an iPad was a fairly friendly experience. Resolution and contrast were adequate. I liked how easy it was to listen to music while reading. I probably wouldn’t have been able to read in full sunlight, but then again I dislike reading regular books in full sunlight. The iPad is quite a bit heavier than a paperback and it has an annoying way of deciding to change the page orientation when I didn’t want that to happen. I’d consider reading another e-book but am not in a hurry.

One reason I read SF is that it helps me learn to think about alternate and future scenarios. This requires the author not only to have good ideas, but to be able to follow through with a world built on the consequences of those ideas. In terms of his ability to do these things, Vinge is one of the best modern SF authors. Only 50 days until Children of the Sky comes out…

When to Teach C++?

Some friends and I were recently debating when CS undergrads should be taught C++. People have various opinions, but it’s clear that C++ no longer enjoys the somewhat dominant position that it did a number of years ago. My own opinions are rooted in an experience I had around 1995 as a TA for a course that taught C++ to first-year CS students. I didn’t think it worked very well.

We don’t need to teach students every industrially relevant language because any competent computer scientist can rapidly pick up a new language. Rather, a language that is used for coursework should have some pedagogical value. For example, Java and C# are reasonable languages for teaching OO design. Using C is a good way to teach systems-level concepts. Using Scheme / Perl / Python / etc. is a good way to teach algorithms, program design, and the scripting style of development.

Does C++ have pedagogical value? Certainly it can be used to teach OO and systems, but should it be? My general opinion is that it should not be — the overhead is too high. First, the language is just very large. This can be somewhat finessed by teaching a subset, but even so there are difficult feature interactions that can trip up students. Second, the near complete lack of language-level runtime error checking makes C++ a pedagogical disaster (of course the same is true for C, but the language is far more manageable). Finally, in my experience the quality of compile-time diagnostics is not that great. A programmer I know has about a 10-page printout on his cubicle wall that contains a single compiler error message from a heavily-templated program. Presumably modern implementations like Clang++ are better, but still — ugh.

Anyway, my answer to the original question is that C++ should be taught late, for example in a junior or senior level projects course, after the students have learned a bit about disciplined programming and about the design of programming languages. Furthermore, it should be optional. I’d be interested to hear others’ thoughts.

Box Elder Peak

I have a hiking book that refers to Box Elder Peak as the red-headed stepchild of the central Wasatch Range. This is true: Box Elder is a lonely 11,000′ mountain stuck right in between the big and impressive Timpanogos and Lone Peak massifs. The other day Dave Hanscom and I decided to climb it. Neither of us was at full power: Dave had done a long trail run two days earlier and wasn’t totally recovered; I had spent much of the previous month either traveling at sea level or mildly ill — becoming weak either way.

We started at the trailhead near the Granite Flat campground in American Fork Canyon and hiked to a saddle on Box Elder’s north ridge that overlooks Dry Fork. From there, we followed a faint trail along the ridge to the summit, for a total elevation gain of about 4300′. The wildflowers on the summit ridge were great — they showed up very late this year but are making up for lost time. We had planned to hike over the sub-peak just south of Box Elder, meeting a trail on the other side. However, when we got to the saddle between the two peaks, the gully heading east looked inviting. If it had been filled with snow we’d have needed axes, but the top was melted out and lower down we walked on low-angled avalanche remnants until our trail crossed the gully. Total time was 6 hours — I guess the nice thing about being slow is that we got to spend longer on a nice mountain.

C No Evil


Your mortal enemy would like to ship a really great product in a week or two. Towards the goal of maximally delaying this product, you may inject a single C preprocessor definition into one of their header files. What is it? Keep in mind that anything which stops the project from compiling will be rapidly detected and therefore does not meet the goal.

Here are a few examples to get started with (some my own, some from a Reddit thread today).

#define FALSE 1
#define FALSE exit()
#define while if
#define goto
#define struct union
#define assert(x)
#define volatile
#define continue break
#define double int
#define long short
#define unsigned signed

Have fun.

Update: Someone on hacker news posted a link to The Disgruntled Bomb, which is based on the same idea. Nice!

Testing Commercial Compilers

A few weeks ago a reader left this comment:

Just out of curiosity John, have you approached any of the big commercial compiler companies about getting free licenses for their products? I don’t work in the compiler business but if a university research time offered to rigorously test my software, for free, I’d say yes. You can always ignore bug reports after all.

I wanted to devote a post to the answer since it’s something I’ve spent a fair amount of time thinking about. Let’s look at the tradeoffs.

Pros of Testing Commercial Compilers

The whole reason I started working on this was the fact that bugs in commercial compilers for embedded systems were irritating me and the students in my embedded systems courses. So of course I want these compilers (and “regular compilers” too) to be more correct.

Finding bugs in commercial compilers benefits my project by showing that our work is generally useful. If we only found bugs in open source compilers, people could just explain that these tend to be much buggier than commercial tools. This is false, though it doesn’t stop people from occasionally saying it anyway; for example, see Chris Hills’ posts here. In any case, as a proof of concept it is important for Csmith to find bugs in the highest-quality compilers that are available, regardless of who produces them.

Cons of Testing Commercial Compilers

First, reporting compiler bugs is hard work. It’s not my job to improve a product that someone else is making money by selling. If someone wants me to do this, they can hire me as a consultant. If they want to do the work on their own, Csmith is open source under a corporation-friendly license.

Second, the people producing these compilers may not have any interest in having me test it. There is, in fact, a potentially serious downside: if I publicize the fact that I found a lot of bugs in safety-critical compiler A, then the people producing compiler B can use this as a selling point. I’ve noticed that sometimes people steeped in the open source world have a difficult time believing this is a serious concern, but it is.

Third, random testing works best when the bug reporting / fixing feedback cycle is rapid. Speed is important because high-probability bugs tend to mask lower-probability bugs. As a tester operating outside of the company, I’m tied to a months-long release cycle as opposed to just updating to the latest revision every morning.

Fourth, commercial tools have a closed process. I can’t see the full set of bug reports, I can’t access the source code or commit log, and perhaps I cannot even mail the developers directly.

Fifth, commercial compilers are often a pain. They probably use the despicable FlexLM. They often run only on Windows. They probably have obscure command-line options. They may target platforms for which I don’t have good simulators.

Finally, the purpose of a commercial compiler is simple: it is to make money for the company selling it. Given finite engineering resources, a company may well be more interested in supporting more platforms and improving benchmark scores than it is in fixing bugs. If they do fix bugs, they’ll begin with the ones that are affecting the most valuable customers.

Conclusion

In contrast with the points above, acting as a test volunteer for open source compilers is a pleasure. I can see the code and directly interact with developers. Bugs are often fixed rapidly. More important, when these compilers are improved it benefits everyone, not just some company’s customers. I feel a substantial sense of obligation to the open source community that produced the systems I use every day. It seems crazy but I’ve had a Linux machine on my desk continuously for a bit more than 19 years.

I’ve tested a number (>10) of commercial C compilers. Each of them has been found to crash and to generate incorrect output when given conforming code. This is not a surprise — compilers are complicated and all complex software contains bugs. But typically, after this initial proof of concept, I stop testing and go back to working on GCC and LLVM. I’ve heard through the grapevine that Csmith is in use at several compiler companies. The engineers doing this have not shared any details with me and that is fine — as long as the bugs are getting fixed, my project’s goals are being met.

Fall on Snow

Yesterday the local news had this story about a guy who took an uncontrolled slide down a chute in Maybird Gulch in Little Cottonwood Canyon outside of Salt Lake City. The slide took him over some rocks and he was lucky to survive — the video accompanying the story is terrifying.

Video Courtesy of KSL.com

One has to wonder:

  • If he had made it across the snow safely, would the boy scouts in his group have followed? If several of them had gotten out on the snow and fallen together, there could easily have been multiple deaths.
  • Why did he go out onto the snow almost lying down? This position made sliding a certainty.
  • Why did he go out onto steep snow, with an obviously bad runout, without an axe?

Whenever this kind of thing happens the news likes to warn people to carry an ice axe — but without substantial training and practice this is useless and may actually increase the risk since an axe has lots of sharp parts that you don’t want sticking into your face or gut.

Here’s a quick summary of what he should have done differently:

  1. A quick look at the fall line — the line an object falling from a particular starting point will take — makes it clear that this is a very dangerous place to fall. So it would have been a great idea to simply turn back before stepping onto the snow, particularly for someone leading a youth group.
  2. If he had to cross the snow, he should have stayed on his feet and kicked steps. The snow was soft enough that he could have done this in sneakers, though stiff-soled hiking or mountain boots would have worked better. Just doing this, he would have had a good chance of making it across the snow.
  3. Every time he took a step, his axe should have been planted firmly in the snow. Since the slope doesn’t look extremely steep (40 degrees maybe?), he should have used the “cane position” which is pretty much like it sounds — you hold the head of the axe and use its shaft and spike like a cane, only moving the axe when both feet are solidly planted. In this position a fall can usually be stopped before it turns into a slide.
  4. If he somehow managed to start sliding, he should have performed a self arrest. Although it’s not 100% clear that this would have succeeded in the soft snow, it would at least have slowed him down and ensured that his feet pointed down-slope (as it was, not hitting his head on the rocks was pure luck).

My judgement is that roping up would have been overkill — a competent group could have safely crossed this chute unroped. At most a piece of webbing for a hand line would have sufficed, with a belayer sitting on the rocks. Crampons would only have added to the danger in the soft snow.

I’ve climbed in this area; it’s spectacular but definitely not a good place to fall. (In fact, the fall in the video happened in one of the “nice looking couloirs towards the East end of the headwall” that I mentioned in a trip report on Summitpost five years ago.) Every year Accidents in North American Mountaineering contains a lot of entries under “fall on snow, failure to self-arrest.” Most years I practice self-arrests on a safe slope, and I’ve been lucky enough to never have to do this for real.

Open Proposals (or: Take My Idea — Please)

Sharing research papers on the web is not very controversial because sharing benefits everyone (other than a few increasingly irrelevant special interests). Sharing research proposals is a thornier proposition: since the work remains to be done, it exposes researchers to scooping. However, I would argue that scooping is not really very likely, and anyone whose ideas are good enough to steal is probably pretty successful anyway. In fact, if someone takes my idea and runs with it, then the work will get done without me having to lift a finger and I’ll be able to work on something else. Also, of course, a great research project (like a great product) is never really about the idea — it’s much more about great execution.

A grant proposal that I submit should be funded when the proposed work is important and when I’m the best person in the world to do it. This happens to correspond to the case where it’s unlikely for someone to successfully steal my idea. If I’m not the best person to do the work, then the proposal shouldn’t be funded and (assuming the idea is any good) someone else should work on it.

So far I’ve tried to argue that opening up research proposals has few drawbacks. But what are the advantages?

  • Proposers will get more and earlier feedback on their ideas.
  • Since researchers will be able to find out earlier what other people are working on, there will be more opportunities (and hopefully incentives) to collaborate.
  • Researchers will put stakes in the ground earlier than they otherwise would. I love this — duplicated effort is a depressing waste of time and money.
  • Everybody gets to see what researchers are asking for public funding to work on.

The other day I blogged a short proposal. Hopefully I’ll keep doing this, though it’s not clear I’ll always be able to convince my co-PIs that it’s a good idea. Personally, I’d be happy if NSF, DARPA, NIH, etc. would openly post every proposal they receive, as soon as they receive it. And maybe, just maybe, a few people will decide not to submit proposals that aren’t really ready for submission, lowering the massive workloads faced by program officers.

Isolation with a Very Small TCB

A basic service provided by most computer systems is isolation between different computations. Given reliable isolation we can safely run programs downloaded from the web, host competing companies’ sites on the same physical server, and perhaps even run your car’s navigation or entertainment software on the same processor that is used to control important system functions.

Other factors being equal, we’d like an isolation implementation to have a trusted computing base (TCB) that is as small as possible. For example, let’s look at the TCB for isolating Javascript programs from different sites when we browse the web using Firefox on a Linux box. It includes at least:

  • pretty much the entire Firefox executable including all of the libraries it statically or dynamically links against (even unrelated code is in the TCB due to the lack of internal isolation in a C/C++ program)
  • the Linux kernel including all kernel modules that are loaded
  • GCC and G++
  • some elements of the hardware platform (at least the processor, memory controller, memory modules, and any peripherals that use DMA)

A deliberate or accidental malfunction of any of these elements may subvert the isolation guarantee that Firefox would like to provide. This is a large TCB, though probably somewhat smaller than the corresponding TCB for a Windows machine. Microkernel-based systems potentially have a much smaller TCB, and a machine-verified microkernel like seL4 removes even the microkernel from the TCB. Even so, the seL4 TCB is still nontrivial, containing for example an optimizing compiler, the substantially complicated specification of the microkernel’s functionality, a model of the hardware including its MMU, and the hardware itself.

My student Lu Zhao’s PhD work is trying to answer the question: How small can we make the trusted
computing base for an isolation implementation? The target for this work is ARM-based microcontrollers that are large enough to perform multiple tasks, but way too small to run Linux (or any other OS that uses memory protection). Lu’s basic strategy is to implement software fault isolation (SFI). His tool rewrites an ARM executable in order to enforce two properties: memory safety and control flow integrity. We’re using a weak form of memory safety that forces all stores to go into a memory region dedicated to that task. Control flow integrity means that control never escapes from a pre-computed control flow graph. This boils down to rewriting the binary to put a dynamic check in front of every potentially dangerous operation. Using the excellent Diablo infrastructure, this was pretty easy to implement. I won’t go into the details but they can be found in a paper we just finished.

The trick to making Lu’s work have a small TCB was to remove the binary rewriter from the TCB by proving that its output has the desired properties. To do this, he needed to create mathematical descriptions of memory safety and control flow integrity, and also to borrow an existing mathematical description of the ARM ISA. Reusing the ARM model was ideal because it has been used by several previous projects and it is believed to be of very high quality. Automating the construction of a non-trivial, machine-checked proof is not so easy and Lu spent a lot of time engineering proof scripts. The saving grace here is that since we control where to add safety checks, the proof is in some sense an obvious one — the postcondition of the check provides exactly the property needed to establish the safety of the potentially-unsafe operation the comes after the check.

The TCB for Lu’s isolation solution contains the HOL4 theorem prover, his definitions of memory safety and control flow integrity (less than 60 lines of HOL), the model of the ARM ISA, and of course the ARM processor itself. All of these elements besides the processor are of quite manageable size. This work currently has two major drawbacks. First, the proofs are slow to construct and the automated proving part only succeeds for small executables. Second, overhead is far higher than it is for a state of the art sandboxing system such as Native Client. In principle the theorem proving infrastructure will be a powerful tool for optimizing sandboxed executables — but we haven’t gotten to that part of the work yet.

As far as we can tell, there’s not much more that can be removed from the TCB. Lu’s SFI implementation is the first one to do full formal verification of its guarantees for real executables. An important thing I learned from working with Lu on this project is that dealing with a mechanical theorem prover is a bit like having kids in the sense that you can’t approach it casually: you either go whole-hog or avoid it entirely. Perhaps in the future the theorem proving people will figure out how to make their tools accessible to dabblers like me.

Proposal for Automated Compiler Bug Reports

[Yesterday I submitted a proposal to Google for a modest amount of money to work on turning large random programs that expose compiler flaws into concise bug reports. Below is a transcription that is mostly faithful (citations are omitted and I changed the example bug report from a floating figure into inline text). Feedback is appreciated.]

Abstract

Csmith, our random testing tool for C compilers, has helped us discover and report nearly 400 previously unknown compiler bugs, mostly in GCC and LLVM. More than 90% of these bugs have now been fixed by compiler developers. Moreover, most of these bugs were in the “middle ends” of the compilers, meaning that they impact multiple programming languages (Go, C++, Ada, etc.) as well as many target architectures.

By proactively finding compiler bugs missed by existing test suites, we help compiler teams eliminate flaws that would otherwise make it into deployed tools. Though it is uncommon to encounter a serious compiler bug, when one does bite a development effort a lot of time is typically wasted. We help by performing nearly continuous testing of the GCC and LLVM development versions. After reporting hundreds of bugs, we appear to have reached a steady state where both compilers are substantially correct but every week or two a new bug is found.

The process of turning a bug-triggering compiler input into a good bug report is tedious and time consuming, while also requiring technical skill and good taste. We propose removing humans from this part of the picture by automating the construction of compiler bug reports.

Research Goals

The goal of the proposed work is to enable a workflow where someone installs our software on a fast machine and tells it how to build the latest compiler version and where to send bug reports.

The machine is now totally unattended. Each day it will build the latest compiler version and spend the rest of the day testing it. When a new bug is discovered, it will be reported. The bug reports will look like this:

Bug-triggering program:

extern int printf (const char *, ...);

static int *g_135 = &g_16[0];
static int *l_15 = &g_16[0];

int main (void) {
  g_16[0] = 1;
  *g_135 = 0;
  *g_135 = *l_15;
  printf("%d\n", g_16[0]);
}
  • Expected output — produced by GCC 3.4, GCC 4.1, Intel CC 11.0, Clang 2.8, and Microsoft VC9 — is “0”.
  • Using r156486 for i686-pc-linux-gnu, “gcc -O” produces “1”. The same version at -O0 produces the expected output.
  • First version of GCC containing this bug is r147613.
  • Probable fault location: the dead store elimination pass.

The manually generated report for this bug is here.

The problem we need to solve is taking a large chunk of random C code (empirically, the sweet spot for randomized compiler bug finding is around 80KB of code) and turning it into an actionable bug report. Our three years’ of experience reporting compiler bugs has given us a good working understanding of what compiler developers want and need in a bug report. The rest of this proposal explains how the different parts of this report will be produced.

The Most Important Problem: Test Case Minimization

The most important problem is test case minimization: turning a large pile of random C code into a small failure-triggering input like the one above. The ddmin Delta debugging algorithm is the best-known solution for test case minimization. In a nutshell, a Delta debugger is just a greedy search that works by iteratively removing a chunk of the input and checking if the new input still triggers the bug. Delta can be significantly optimized by progressively removing smaller and smaller chunks of the input and also by coordinating its choice of chunks with the structure of the test input. The best-known Delta implementation, by Wilkerson, incorporates both of these optimizations.

It works but has two serious problems when used to reduce the size of C programs. First, it gets stuck at local minima that are far from the (apparent) true minimum sizes for failure-inducing compiler inputs due to being line-based. Many interesting reduction operations for C programs need to be aware of its token-level and AST-level structure. Second, Wilkerson’s Delta usually creates reduced test cases that invoke undefined behavior. The GCC developers — to put it mildly — do not appreciate bug reports containing test cases that rely on undefined behavior.

We propose investigating two different approaches to fixing these problems. The first approach has two parts:

  • A C-aware Delta debugger that will significantly improve on the Wilkerson Delta by first performing local transformations such as reducing x+y to x or x to 0, and second by performing global simplifications such as removing an argument not only from a function definition, but also from all uses of the function, removing a level of indirection from a pointer, and performing inline substitution of simple function bodies.
  • We are working with Chucky Ellison at UIUC who is developing an executable semantics for C that robustly detects most kinds of undefined behavior (including several, such as unsequenced updates to an lvalue, that are largely undetected by other tools). His tool is a research prototype; we are helping refine it into a useful part of a Delta debugging loop.

Our second new approach to testcase reduction takes an entirely different approach: using Csmith itself to perform test case minimization. Our insight is that this will work because Csmith understands not only the structure of C, but also how to avoid introducing undefined behavior. Drawbacks of the Delta-in-Csmith approach are that it increases the complexity of an already complex tool and that it cannot be used to reduce failure-inducing C programs that did not come from Csmith in the first place. We believe that it is worth exploring both approaches.

Secondary Research Problems

Although our major focus will be on creating reportable test cases, we also plan to attack the following problems. First, making a robust guess at the first revision that contains a compiler bug. We have already tried using a naive binary search to find the first broken version and too often, the result of the search is a version that contains an incidental change that merely exposes the bug. By randomly exploring a variety of similar test cases and compiler options, we hope to robustly locate the first version that contains the bug.

Second, we will perform fault localization—try to guess where in the compiler the problem lies. When the problem lies in an optional part of the compiler (i.e., an optimization pass) this is fairly easy using Whalley’s work. When the bug lies in the frontend or backend, the situation is more difficult and is beyond the scope of this proposal.

Outcome

Evaluating this work is easy. The only important question is: Are our bug reports good enough that they motivate compiler developers to fix the bugs? We will answer this question by continuing to report bugs to major open source compiler projects like GCC and LLVM. Furthermore, our test case reduction and related tools will be released as open source as part of the already open-sourced Csmith.

Relation to Previous Work

We are already taking advantage of as much previous work as seems to be available. Mainly, this consists of Zeller’s Delta debugging papers and a handful of follow-up papers that have appeared over the last ten years, in addition to Wilkerson’s Delta implementation and Ellison’s executable C semantics. No previous compiler bug-finding effort (that we are aware of) has resulted in anything close to the nearly 400 bugs we have reported, and our guess is that for this reason nobody has yet had to develop good tool support for automated bug reports.