The Art and Science of Testcase Reduction for Compiler Bugs

Test case reduction is a common problem faced by programmers where some large input (or more generally, some complicated set of circumstances) causes a program to fail, but we wish to know the smallest input (or the simplest circumstances) that causes the same failure. Reduced test cases are important because:

  • Since the bulk of the reduced test case is usually relevant to the problem at hand, a good guess about the underlying problem can often by made simply by inspecting the test case.
  • Usually, the system under test doesn’t spend very long executing the reduced test case, making debugging much easier.
  • A reduced test case makes a good regression test.
  • Even if the original failure-inducing test case is unreportable because it is proprietary, the reduced test case may be simple enough that it is reportable.

Test case reduction is particularly difficult when parts of the input are implicit — time, context switches, cache misses, etc. Reducing inputs for compiler bugs — the topic of this post — is both easier and harder than other common test case reduction problems. They are easier because compiler behavior (usually) depends only on the contents of a collection of source code files comprising its input (this may not be the case when the compiler executes non-deterministically or when it reads hidden state like precompiled header files). Test case reduction for compilers is harder because the inputs are highly structured. Thus, it may be impossible to get rid of some piece of the input (e.g. a struct field) because it is referenced by other parts of the input.

The GCC developers have written about testcase reduction here and here. LLVM comes with its own test case reducer called Bugpoint, and reduction is also discussed here. The most commonly used automated test case reduction tool for compiler inputs is delta, an implementation of the ddmin algorithm. However, delta improves upon the original by exploiting the natural hierarchy found in block-structured programming languages.

A basic strategy for reducing a test case has been known for a long time. It is:

  1. Remove part of the input.
  2. See if it still causes the failure.
  3. If yes, go to step 1. If no, backtrack to the last version that caused the failure and then go to step 1.

This is boring and easy. The tricky thing is that when a fixpoint is reached (no part of the input can be removed without breaking the test case), the resulting test case is probably not the smallest one that triggers the bug. There are several reasons for this:

  • The definition of “part of the input” may not capture necessary transformations. For example, the delta tool does not understand how to remove an argument from a function and also from all calls to the function (no language-independent tool could be expected to do this).
  • The search is greedy; it may have missed a branch much earlier in the search tree that lead to a smaller input.
  • The actual smallest input triggering the bug may bear no resemblance to the testcase that we have.

Escaping local minima requires at least some domain specificity, and at most a deep understanding of the system under test. Let’s look at an example. A few weeks ago I reported a bug where this program causes GCC to crash:

struct S0
  int f1:1;
int g_155;
int g_229;
int g_360;
struct S0
func_93 (int p_94, int p_95, int p_96)
  struct S0 l_272 = {
  for (; g_155;)
    if (0)
      l_272.f1 = 1;
  for (;;)
      unsigned l_663 = 94967295;
      if (g_229++, g_360 = p_94 || (l_272.f1 &= l_663))

As reduced testcases go, this is perfectly fine. However, Jakub Jelinek, a GCC developer, posted a better version:

struct S { int s : 1; };
int a;

foo (int x, int y)
  struct S s;
  s.s = !!y;
  while (1)
      unsigned l = 94967295;
      a = x || (s.s &= l);

While this is clearly derived from the testcase that I reported, it was not produced simply by deleting stuff. Jakub made changes such as:

  • Simplifying identifiers
  • Changing a function to return void and to eliminate an argument
  • Factoring code out of a conditional
  • An interesting refactoring of some loop code into s.s = !!y;

Why does this matter?  Because it would be awesome if we could automate these transformations and save Jakub some time. The first three are clearly possible, but I’m not sure there’s any general version of the fourth transformation.

It turns out that Jakub is kind of an unsung testcase reduction hero. Consider this awesome example, where his reduced test case is not even in the same programming language as the original. Other good ones are PR 48305, PR 47139, PR 47141, and PR 45059. All of these make my reducers look kind of bad. Many more excellent reduced GCC testcases have been created by people like Richard Guenther, Volker Reichelt, and Andrew Pinski.

What other tricks do people use to create the smallest possible test cases?

  • Inlining a function body
  • Declaring multiple variables in a single line
  • Removing a level of indirection from a pointer or a dimension from an array
  • Replacing a struct or union with one or more scalars

A final reduction technique is to deeply understand what is going wrong in the compiler and trigger that behavior in some different way. For example, see the introduction of __UINTPTR_TYPE__ in PR 47141.

In summary, testcase reduction is both an art and a science. The science part is about 98% of the problem and we should be able to automate all of it. Creating a test case from scratch that triggers a given compiler bug is, for now at least, not only an art, but an art that has only a handful of practitioners.

Csmith 2.1 Released

We’ve released version 2.1 of Csmith, our random C program generator that is useful for finding bugs in compilers and other tools that process C code. The total number of compiler bugs found and reported due to Csmith is now more than 400. All Csmith users should strongly consider upgrading.

New features in this release — all implemented by Xuejun Yang — include:

  • By default, functions and global variables are marked as “static,” permitting compilers to optimize more aggressively in some cases.
  • We now try harder to get auto-vectorizers and other loop optimizers in trouble by generating code that is more idiomatic and therefore more likely to be optimized. In particular, array indices are in-bounds by construction instead of by using % operators.
  • Unions are supported. Generating interesting but conforming uses of unions was not as easy as we’d have hoped.
  • The comma operator is supported, as in x = (y, 1, z, 3).
  • Embedded assignments are supported, as in x = 1 + (y = z).
  • The pre/post increment/decrement operators are supported.
  • A --no-safe-math mode was added, which avoids calling the safe math wrappers. This is useful when trying to crash compilers but the resulting executables should not be run since they are very likely to have undefined behavior.

These features, other than --no-safe-math, are turned on by default. They have found quite a few new compiler bugs; the most interesting is perhaps this one.

An excellent recent development is that Pascal Cuoq, one of the main Frama-C developers, has done a lot of cross-testing of Frama-C and Csmith. I’m not sure what the final score was, in terms of bugs found in Csmith vs. bugs found in Frama-C, but the virtuous cycle has increased the quality of both tools. Frama-C is totally great and I recommend it to people who are serious about writing bulletproof C code.

Perverse Incentives in Academia

A perverse incentive is one that has unintended consequences. The world is full of these and the Wikipedia article has some great examples. Academia seems particularly prone to perverse incentives.

Incentive Intended Effect Actual Effect
Researchers rewarded for increased number of publications. Improve research productivity. Avalanche of crappy, incremental papers.
Researchers rewarded for increased number of citations. Researchers do work that is relevant and influential. H-index obsession; list of references no longer included in page limit at many conferences.
Researchers rewarded for increased grant funding. Ensure that research programs are funded, promote growth, generate overhead $$. Time wasted writing proposals, inefficient use of public $$.
Maximum of two proposals submitted to an NSF program. Discourage over-submission. You’d have to be crazy to not meet your quota these days.
Teachers rewarded for increased student evaluation scores. Improved accountability; ensure customer satisfaction. Easy courses, inflated grades.
Teachers rewarded for increased student test scores. Improve teacher effectiveness. Teaching to the tests; emphasis on short-term learning.
Departments rewarded for increasing US News ranking. Stronger departments. Resources squandered trying to influence rankings.
Departments rewarded for increasing numbers of BS, MS, and PhD degrees granted. Promote efficiency; stop students from being trapped in over-long degree programs; impress the state legislature. Class sizes increase; entrance requirements watered down; graduation requirements watered down.
Departments rewarded for increasing student credit/contact hours (SCH). The university’s teaching mission is fulfilled. SCH-maximization games are played: classes are duplicated, turf wars occur over service courses.

Strong academics, it should go without saying, are highly self-motivated — this makes the academic system somewhat resistant to perverse incentive problems. Even so, it’s not realistic to expect everyone to take the high road all the time, particularly since our salaries and jobs are on the line.

What is going on? Fundamentally, the purpose of a university, while being pretty obvious, is tough to quantify. Of course this does not deter administrators, who go ahead and come up with lots of metrics that seem to be — but are not — useful proxies for the proper goals of a university. Then, these metrics are used to determine raises, promotions, and such. The results are depressing but predictable. And of course it’s not fair to simply blame the administrators — at faculty hiring time a thick CV is a lot more reassuring than a thin one.

Towards Tinkers

The heroes of Vernor Vinge’s The Peace War are members of a scattered society of tinkers who — without any real industrial base — manage to develop and produce very high-tech devices including fast, small computers. I’m trying to figure out how realistic this is.

The software side seems entirely feasible. Today’s open source community has shown that small groups of talented, motivated people can create enormously sophisticated artifacts.

On the other hand, hardware is a problem. Vinge’s tinkers cannot exist in today’s world where fast computers are made only in billion-dollar fabs. Perhaps it will become possible to grow sophisticated chips using hacked bacteria, or maybe new fabrication technologies such as 3D printing will evolve to the point where they can produce realistic large-scale integrated circuits.

An end point for 3D printing technology is the Diamond Age‘s matter compiler, which can produce almost anything based on a plan and a “feed” of energy and atoms. But of course in that book, the feed is a centrally controlled resource. The “seed” technology from the end of The Diamond Age is tinker-friendly, but very far-fetched for now.

The recent near-doubling of hard disk prices due to flooding in Thailand shows how fragile our supply chains are. It’s nice to think that alternate versions of some high-tech products could be produced by small, autonomous groups of people.

Black Friday on Wednesday

Until now, my department hasn’t done any kind of formal, department-wide evaluation of our graduate students and their progress. A number of people, including me, have argued for some time that we should be doing something like CMU’s Black Friday. This semester Suresh, our current DGS, has made this happen; the meeting was today.

Overall I think it was really positive. First, it helps give faculty a global perspective on our grad program. I would hope that this is particularly useful for new faculty who don’t have the context that the rest of us have been building up for years. Second, this kind of meeting generates peer pressure. This is not a bad thing for hands-off advisers like me. Some of this peer pressure escapes from the meeting and reaches students; for example, at the meeting we heard about a few long-term PhD students who have been working full time for the past few years, who handed their advisers thesis drafts or other long-awaited material when they learned about this meeting.

One might ask: Does this stress out the students? At Virginia, where I did my PhD, an annual black Friday occurred and while we were aware of it, I don’t remember it as being particularly stress-inducing. Rather, a week or two afterwards a letter would materialize in each of our mailboxes saying perhaps “we want to see a proposal like right now” or “the faculty expects you to defend before Spring.” The hypothesis is that when these letters come from the faculty as a whole, they perhaps carry more weight than an adviser’s nagging.

Putting Oneself Through College

A lot has been written lately about the rising costs of higher education. Is it still possible to put oneself through college without working full time? It’s certainly not easy. For example, the Utah minimum wage is $7.25/hour. If a student works 20 hours per week for 50 weeks, the resulting $7,250 doesn’t even cover in-state tuition ($6,763) after taxes. To make things work, a student needs to work more hours, get paid more per hour, or offset costs using scholarships.

When I was in college in 1990 through 1995, it was not so difficult to put oneself through school and many of my friends did it. I supported myself through a combination of scholarships, part-time jobs (minimum wage at first; more once I got a programming job), and some support from my parents — a few hundred dollars here and there, if I recall correctly (but they also bought me a car and paid its insurance and registration). My total income was never larger than $7,000 per year and I never paid more than $150/month for rent — we’re talking basement apartments and three or four roommates. I didn’t take out any loans, though I did have a bit of credit card debt from time to time. When I started grad school, the $17,000/year stipend felt like a lot of money. But this was all some time ago. Since then, minimum wage has doubled but in-state tuition at Kansas State University — where I studied — has increased five-fold. Additionally, textbook costs have gotten out of control, the job market has taken a bad turn, and scholarship opportunities have been disappearing due to budget cuts.

Recently I asked about a dozen juniors and seniors in my department how they are paying for college. Nine people responded:

  • 6 students receive scholarships
  • 5 work part time
  • 4 get some support from parents
  • 3 get tuition reimbursement from an employer
  • 2 have used personal savings
  • 2 work full time
  • 2 are taking out loans

The interesting bits here are that most of these students work and surprisingly few are taking out loans. A bit of searching revealed that college students in Utah graduate with the lowest average debt ($15,500 in 2010) out of all 50 states. In contrast, the average graduate in New Hampshire has $31,000 in debt. This article explains:

States in the Northeast and Midwest are disproportionately represented among the “high debt” states, while those in the West are disproportionately represented among the “low debt” states. This may be related to the fact that a larger than average share of students in the Northeast and Midwest attend private nonprofit four-year colleges. In comparison, Western states have a larger share of students attending public four-year colleges.

I suppose this makes sense. Overall, it seems to still be possible to put oneself through college, but just barely — and it would be difficult or impossible to do so for students paying out-of-state (or private school) tuition or for students whose skills are less marketable than are those of the upper-level computer science students I surveyed. It is distressing that tuition costs have gone up so much during the recession. I’m sure the university administrators could tell me why it made sense to do this, but it’s not at all clear that it’s a long-term (or even medium-term) win for the university to price itself out of a significant chunk of the potential students — those who need to pay for school themselves, but who do not wish to graduate with a lot of debt.

Open Access Fees

The “open access fee” is a charming little aspect of academic publishing where I have the option to pay, for example, $3000 to the IEEE, and then they’ll poke a hole in their paywall so that anyone can download a paper without paying. The fee is per paper. Here’s a list of some publishers’ fees. I find the specific numbers to be interesting. Why is Cell Press $5000, IEEE $3000, and AIP $1350? A few guesses:

  • One publisher set the fee to an arbitrary value and then the rest of them chose fees in the same ballpark, but a bit higher or lower depending on whether they want to appear to be more or less prestigious.
  • The fee is comfortably larger than the total expected future value of the closed access version of the publication.
  • The fee is set to maximize short-term profits.

Of course, the vast majority of academic publications have a total future value that is near zero. That is because nobody is going to read them, open access or not. Thus, my guess is that if very many people decide to pay the open access fee, it’s a major windfall for the publisher (not that they really need it — Michael Nielsen reports that in 2009 Elsevier made a profit of 1.1 billion dollars on a revenue of 3.2 billion dollars). On the other hand, there will be some publications that are downloaded a lot of times, and the publisher stands to lose out if these are open access. But my guess is that the percentage of publications whose future value exceeds $3000 is negligible.

A perhaps hidden advantage of the open access fees is that if people buy into the idea, it’ll tend to discourage some of the massive over-publication that is currently making the CS publishing system a not-very-fun place to be. But I’m guessing that most people will recognize the inherently bad value provided by these fees, and will not buy in. In CS, at least, the culture is to make papers available on personal and/or institutional web pages — in many cases this is permitted by publishers, though there are a lot of tricky details.