Reducers are Fuzzers


A test case isn’t just a test case: it lives in the (generally extremely large) space of inputs for the software system you are testing. If we have a test case that triggers a bug, here’s one way we can look at it:

The set of test cases triggering a bug is a useful notion since we can search it. For example, a test case reducer is a program that searches for the smallest test case triggering a bug. It requires a way to transform a test case into a smaller one, for example by deleting part of it. The new variant of the test case may or may not trigger the bug. The process goes like this:

I’ve spent a lot of time watching reducers run, and one thing I’ve noticed is that the reduction process often triggers bugs unrelated to the bug that is the subject of the reduction:

Sometimes this is undesirable, such as when a lax interestingness test permits the reduction of one bug to get hijacked by a different bug. This happens all the time when reducing segfaults, which are hard to tell apart. But on the other hand, if we’re looking for bugs then this phenomenon is a useful one.

It seems a bit counterintuitive that test case reduction would lead to the discovery of new bugs since we might expect that the space of inputs to a well-tested software system is mostly non-bug-triggering with a few isolated pockets of bug-triggering inputs scattered here and there. I am afraid that that view might not be realistic. Rather, all of the inputs we usually see occupy a tiny portion of the space of inputs, and it is surrounded by huge overlapping clouds of bug-triggering inputs. Fuzzers can push the boundaries of the space of inputs that we can test, but not by as much as people generally think. Proofs remain the only way to actually show that a piece of software does the right thing any significant chunk of its input space. But I digress. The important fact is that reducers are decently effective mutation-based fuzzers.

In the rest of this post I’d like to push that idea a bit farther by doing a reduction that doesn’t correspond to a bug and seeing if we can find some bugs along the way. We’ll start with this exciting C++ program

#include <iostream>

int main() {
  std::cout << "Hello World!++" << std::endl;
}

and we’ll reduce it under the criterion that it remains a viable Hello World implementation. First we’ll preprocess and then use delta’s topformflat to smoosh everything together nicely:

g++ -std=c++11 -E -P -O3 hello-orig.cpp | topformflat > hello.cpp

You might be saying to yourself something like “it sure is stupid that John is passing an optimization flag to the preprocessor,” but trust me that it actually does change the emitted code. I didn’t check if it makes a difference here but I’ve learned to avoid trouble by just passing the same flags to the preprocessor as to the compiler.

Anyhow, the result is 550 KB of goodness:

Here’s the code that checks if a variant is interesting — that is, if it acts like a Hello World implementation:

g++ -O3 -std=c++11 -w hello.cpp >/dev/null 2>&1 &&
ulimit -t 1 &&
./a.out | grep Hello

The ulimit is necessary because infinite loops sometimes get into the program that is being reduced.

To find compiler crashes we’ll need a bit more elaborate of a test:

if
  g++ -O3 -std=c++11 -w hello.cpp >compiler.out 2>&1
then
  ulimit -t 1 &&
  ./a.out | grep Hello
else
  if
    grep 'internal compiler error' compiler.out
  then
    exit 101
  else
    exit 1
  fi
fi

When the compiler fails we look at its output and, if it contains evidence of a compiler bug, exit with code 101, which will tell C-Reduce that it should save a copy of the input files that made this happen.

The compiler we’ll use is g++ r231221, the development head from December 3 2015. Let’s get things going:

creduce --nokill --also-interesting 101 --no-default-passes \
  --add-pass pass_clex rm-tok-pattern-4 10 ../test.sh hello.cpp

The -also-interesting 101 option indicates that the interestingness test will use process exit code 101 to tell C-Reduce to make a snapshot of the directory containing the files being reduced, so we can look at it later. --no-default-passes clears C-Reduce’s pass schedule and -add-pass pass_clex rm-tok-pattern-4 10 add a single pass that uses a small sliding window to remove tokens from the test case. The issue here is that not all of C-Reduce’s passes are equally effective at finding bugs. Some passes, such as the one that removes dead variables and the one that removes dead functions, will probably never trigger a compiler bug. Other passes, such as the one that removes chunks of lines from a test case, eliminate text from the test case so rapidly that effective fuzzing doesn’t happen. There are various ways to deal with this problem, such as probabilistically rejecting improvements or rejecting improvements that are too large, but for this post I’ve chosen the simple expedient of running just one pass that (1) makes progress very slowly and (2) seems to be a good fuzzer.

The dynamics of this sort of run are interesting: as the test case walks around the space of programs, you can actually see it brush up against a compiler bug and then wander off into the weeds again:

The results are decent: after about 24 hours, C-Reduce caused many segfaults in g++ and triggered six different internal compiler errors: 1, 2, 3, 4, 5, 6. One of these was already reported, another looks probably like a duplicate, and four appear to be new.

I did a similar run against Clang++ r254595, again the development head from December 3. This produced segfaults and also triggered 25 different assertions:

llvm/include/llvm/Support/Casting.h:230: typename cast_retty::ret_type llvm::cast(Y &) [X = clang::FunctionProtoType, Y = clang::QualType]: Assertion `isa(Val) && "cast() argument of incompatible type!"' failed.
llvm/include/llvm/Support/Casting.h:95: static bool llvm::isa_impl_cl::doit(const From *) [To = clang::FunctionTemplateDecl, From = const clang::Decl *]: Assertion `Val && "isa<> used on a null pointer"' failed.
llvm/tools/clang/lib/AST/Decl.cpp:2134: clang::APValue *clang::VarDecl::evaluateValue(SmallVectorImpl &) const: Assertion `!Init->isValueDependent()' failed.
llvm/tools/clang/lib/AST/Decl.cpp:2181: bool clang::VarDecl::checkInitIsICE() const: Assertion `!Init->isValueDependent()' failed.
llvm/tools/clang/lib/AST/ExprCXX.cpp:451: static clang::DependentScopeDeclRefExpr *clang::DependentScopeDeclRefExpr::Create(const clang::ASTContext &, clang::NestedNameSpecifierLoc, clang::SourceLocation, const clang::DeclarationNameInfo &, const clang::TemplateArgumentListInfo *): Assertion `QualifierLoc && "should be created for dependent qualifiers"' failed.
llvm/tools/clang/lib/AST/../../include/clang/AST/TypeNodes.def:98: clang::TypeInfo clang::ASTContext::getTypeInfoImpl(const clang::Type *) const: Assertion `!T->isDependentType() && "should not see dependent types here"' failed.
llvm/tools/clang/lib/CodeGen/CodeGenModule.cpp:623: llvm::StringRef clang::CodeGen::CodeGenModule::getMangledName(clang::GlobalDecl): Assertion `II && "Attempt to mangle unnamed decl."' failed.
llvm/tools/clang/lib/CodeGen/../../include/clang/AST/Expr.h:134: void clang::Expr::setType(clang::QualType): Assertion `(t.isNull() || !t->isReferenceType()) && "Expressions can't have reference type"' failed.
llvm/tools/clang/lib/Lex/PPCaching.cpp:101: void clang::Preprocessor::AnnotatePreviousCachedTokens(const clang::Token &): Assertion `CachedTokens[CachedLexPos-1].getLastLoc() == Tok.getAnnotationEndLoc() && "The annotation should be until the most recent cached token"' failed.
llvm/tools/clang/lib/Parse/../../include/clang/Parse/Parser.h:2256: void clang::Parser::DeclaratorScopeObj::EnterDeclaratorScope(): Assertion `!EnteredScope && "Already entered the scope!"' failed.
llvm/tools/clang/lib/Sema/../../include/clang/AST/DeclTemplate.h:1707: void clang::ClassTemplateSpecializationDecl::setInstantiationOf(clang::ClassTemplatePartialSpecializationDecl *, const clang::TemplateArgumentList *): Assertion `!SpecializedTemplate.is() && "Already set to a class template partial specialization!"' failed.
llvm/tools/clang/lib/Sema/../../include/clang/Sema/Lookup.h:460: clang::NamedDecl *clang::LookupResult::getFoundDecl() const: Assertion `getResultKind() == Found && "getFoundDecl called on non-unique result"' failed.
llvm/tools/clang/lib/Sema/SemaDecl.cpp:10455: clang::Decl *clang::Sema::ActOnParamDeclarator(clang::Scope *, clang::Declarator &): Assertion `S->isFunctionPrototypeScope()' failed.
llvm/tools/clang/lib/Sema/SemaDeclCXX.cpp:11373: ExprResult clang::Sema::BuildCXXDefaultInitExpr(clang::SourceLocation, clang::FieldDecl *): Assertion `Lookup.size() == 1' failed.
llvm/tools/clang/lib/Sema/SemaExpr.cpp:2274: ExprResult clang::Sema::ActOnIdExpression(clang::Scope *, clang::CXXScopeSpec &, clang::SourceLocation, clang::UnqualifiedId &, bool, bool, std::unique_ptr, bool, clang::Token *): Assertion `R.getAsSingle() && "There should only be one declaration found."' failed.
llvm/tools/clang/lib/Sema/SemaExprCXX.cpp:2272: clang::FunctionDecl *clang::Sema::FindUsualDeallocationFunction(clang::SourceLocation, bool, clang::DeclarationName): Assertion `Matches.size() == 1 && "unexpectedly have multiple usual deallocation functions"' failed.
llvm/tools/clang/lib/Sema/SemaExprCXX.cpp:6663: ExprResult clang::Sema::CorrectDelayedTyposInExpr(clang::Expr *, clang::VarDecl *, llvm::function_ref): Assertion `TyposInContext < ~0U && "Recursive call of CorrectDelayedTyposInExpr"' failed.
llvm/tools/clang/lib/Sema/SemaExprMember.cpp:91: IMAKind ClassifyImplicitMemberAccess(clang::Sema &, const clang::LookupResult &): Assertion `!R.empty() && (*R.begin())->isCXXClassMember()' failed.
llvm/tools/clang/lib/Sema/SemaLookup.cpp:1904: bool clang::Sema::LookupQualifiedName(clang::LookupResult &, clang::DeclContext *, bool): Assertion `(!isa(LookupCtx) || LookupCtx->isDependentContext() || cast(LookupCtx)->isCompleteDefinition() || cast(LookupCtx)->isBeingDefined()) && "Declaration context must already be complete!"' failed.
llvm/tools/clang/lib/Sema/SemaLookup.cpp:2729: Sema::SpecialMemberOverloadResult *clang::Sema::LookupSpecialMember(clang::CXXRecordDecl *, clang::Sema::CXXSpecialMember, bool, bool, bool, bool, bool): Assertion `CanDeclareSpecialMemberFunction(RD) && "doing special member lookup into record that isn't fully complete"' failed.
llvm/tools/clang/lib/Sema/SemaOverload.cpp:11671: ExprResult clang::Sema::CreateOverloadedBinOp(clang::SourceLocation, unsigned int, const clang::UnresolvedSetImpl &, clang::Expr *, clang::Expr *): Assertion `Result.isInvalid() && "C++ binary operator overloading is missing candidates!"' failed.
llvm/tools/clang/lib/Sema/SemaTemplate.cpp:2906: ExprResult clang::Sema::BuildTemplateIdExpr(const clang::CXXScopeSpec &, clang::SourceLocation, clang::LookupResult &, bool, const clang::TemplateArgumentListInfo *): Assertion `!R.empty() && "empty lookup results when building templateid"' failed.
llvm/tools/clang/lib/Sema/SemaTemplateDeduction.cpp:609: (anonymous namespace)::PackDeductionScope::PackDeductionScope(clang::Sema &, clang::TemplateParameterList *, SmallVectorImpl &, clang::sema::TemplateDeductionInfo &, clang::TemplateArgument): Assertion `!Packs.empty() && "Pack expansion without unexpanded packs?"' failed.
llvm/tools/clang/lib/Sema/SemaTemplateInstantiate.cpp:2781: llvm::PointerUnion *clang::LocalInstantiationScope::findInstantiationOf(const clang::Decl *): Assertion `isa(D) && "declaration not instantiated in this scope"' failed.
llvm/tools/clang/lib/Sema/SemaTemplateVariadic.cpp:290: bool clang::Sema::DiagnoseUnexpandedParameterPack(clang::Expr *, clang::Sema::UnexpandedParameterPackContext): Assertion `!Unexpanded.empty() && "Unable to find unexpanded parameter packs"' failed.

I have to admit that I felt a bit overwhelmed by 25 potential bug reports, and I haven’t reported any of these yet. My guess is that a number of them are already in the bugzilla since people have been fuzzing Clang lately. Anyway, I’ll try to get around to reducing and reporting these. Really, this all needs to be automated so that when subsequent reductions find still more bugs, these just get added to the queue of reductions to run.

If you were interested in reproducing these results, or in trying something similar, you would want to use C-Reduce’s master branch. I ran everything on an Ubuntu 14.04 box. While preparing this post I found that different C-Reduce command line options produced widely varying numbers and kinds of crashes.

Regarding previous work, I believe — but couldn’t find written down — that the CERT BFF watches out for new crashes when running its reducer. In a couple of papers written by people like Alex Groce and me, we discussed the fact that reducers often slip from triggering one bug to another.

The new thing in this post is to show that triggering new bugs while reducing isn’t just some side effect. Rather, we can go looking for trouble, and we can do it without being given a bug to reduce in the first place. A key enabler for easy bug-finding with C-Reduce was finding a simple communication mechanism by which the interestingness test can give C-Reduce a bit of out-of-band information that a variant should be saved for subsequent inspection. I’m not trying to claim that reducers are awesome fuzzers, but on the other hand, it might be a little bit awesome that mutating Hello World resulted in triggering 25 different assertion violations in a mature and high-quality compiler. I bet we’d have done even better by starting with a nice big fat Boost application.


28 responses to “Reducers are Fuzzers”

  1. How well tested are compilers for robustness in the face of invalid inputs, though? These wouldn’t seem to be high priority bugs.

  2. As I already told John in emails about this, what really interests me is:

    How does a reducer fare against a C++ mutation tool? That is, if you had a competitor that just generates similar valid C++ programs to your starting point, does it behave much like this, or is there something special about reduction? As far as I know, there aren’t any free, useful, working C++ mutation tools. I tried using Jamie Andrews’ nice C tool, but of course templates + operator replacement ends in tears. The tiny set of actual syntactically valid things generated didn’t do anything useful.

    Most mutation-based fuzzers for anything with a real grammar need to at least try to stay close to that grammar, so a pure file-fuzzer will also be useless. So is C-Reduce 1) just nice to have around for this in absence of a better mutator or 2) actually doing something special? I would place blind money on 2, but I’m not sure why. I feel like reduction does push you somewhere interesting in the space.

  3. Should the first line of second paragraph say “set of test cases” instead of “set of programs”?

  4. Paul, as a compiler implementor, the reason you look at these bugs is that they cause the compiler the become internally inconsistent. Just because I’ve triggered these inconsistencies with invalid inputs doesn’t mean they cannot also be triggered by valid ones.

  5. Magnus, thanks, fixed, I thought I had gotten all of those already, as you might guess I started out writing this post one way and then switched around the discussion to be more general in the beginning.

  6. ICEs on invalid input are only interesting if they
    happen _before_ an error was reported by the compiler.
    All your gcc bugs don’t fall in this category, so they
    have very low priority.

  7. I cannot speak for the LLVM developers and I haven’t seen the full compiler output for these assertion hits.
    But one thing to remember is that at a certain point you will have to trade compilation speed for fuzzer robustness.
    And because a compiler isn’t some daemon, that runs 24/7 and is vulnerable to security attacks, compilation speed should win.

  8. octoploid, of course. I just throw bugs over the fence and they can do whatever they like with them, it’s their tool and their time.

  9. John,

    I suspect that many of the crashes occur because of syntax/semantic errors in the reduced code that get the compiler into the unexpected state that causes the segfault/assertion failure (I have not checked). These are very low priority problems and I imagine are not worth fixing (the user fixes the syntax/semantic issue and they go away).

    Some thoughts on fuzzing in general:
    http://shape-of-code.coding-guidelines.com/2015/12/07/so-you-found-a-bug-in-my-compiler-whoopee-do/

  10. Isn’t it amazing that such simple fuzzing still finds piles of bugs? I would have thought that those bugs would long have been found because this is such a simple program that was reduced. I guess that means that nobody was looking.

  11. tobi, I think the reducer is exploring corners of the input space that just don’t get looked at very often.

    Derek, see #12.

  12. How hard would it be to detect octoploid’s “ICE before error” criteria? I assume it wouldn’t be too hard, given a predicate for that, to set up a search for case that satisfy it. If the ICE is after many errors, you might make the reduction target be how short an *error output* you can generate that still provokes the ICE.

    As an aside: why doesn’t c-reduce keep ALL inputs? (How fast would that fill an xTB local drive? If you tgz’ed them?) If you wanted to get fancy, you could keep track of results and provenance via a rats nest of symlinks.

  13. @bcs:
    It is pretty easy: just add -Wfatal-errors to the compiler invocation. The compiler then stops at the first error it encounters. And if it isn’t an ICE, just move on.

  14. An argument for fixing “low priority” bugs found by the fuzzer is that unless you do this, the fuzzer quickly becomes useless, since it keep spitting out the same unfixed bugs, which may conceal any higher priority bugs it finds.

  15. @octoploid, that sort of defeats the point. The point is to attempt to mutate the ICE-after-error into an ICE-before-error.

    This is based on the assumption that there are real relevant (by whatever definition) bugs where the first time it’s encountered, it will come after an error and that it will be less costly to reduce that case than find it again as a case were it comes first.

  16. Just run octoploid’s method as a second pass, internal to the script you pass C-Reduce. Detect ICE-after-error just as John is now, then throw it at C-Reduce set up to preserve ICE, and store/terminate if it ever hits an ICE-before-error. Since C-Reduce isn’t starting in a good state, it may not hit one, but by preserving same-ICE you can at least explore the space and see if you can get a before-error ICE.

  17. My general approach with C-Reduce has been to try to keep it deterministic, so that files do not need to be saved, but rather can be regenerated easily.

    But there are two threats to determinism. First, timeouts in the interestingness test admit a bit of nondeterminism, not sure how often that comes up in practice. Second, I’m not 100% certain that multicore C-Reduce is deterministic, although the algorithm was designed to be.

  18. @Alex, that’s more or less what I was proposing.

    @regehr
    C-Reduce is deterministic? That seems…. risky to me. It runs the risk that that determinism could inadvertently close off whole sections of the search space. A less risky way to get that might be to use a randomly seeded PRNG and log the seed.

    In addition to that risk, if it takes 24 hours to do a reduction to get to a given point in a reduction, should I assume it would take hours to regenerate any random midpoints? If so, I’d think being able to skip that even once or twice a semester would be worth the cost of storage (~$0.03/GiB for HDDs last I checked).

    Also, I’m still wondering what a backtracking solution would find. E.g. pick a starting point based on it’s size and how many times you’ve tried it before and churn on it for a set time or number of mutations, then start again with another.

  19. Yes, BFF (And FOE) will watch for new crashes seen during minimization. This was introduced with BFF 2.5 and FOE 2.0: http://www.cert.org/blogs/certcc/fuzzing-pipeline-2.png

    In particular, look at the bottom arrow labeled “minimizer finds other crashers”

    For the full blog posts see https://insights.sei.cmu.edu/cert/2012/07/cert-failure-observation-engine-20-released.html and https://insights.sei.cmu.edu/cert/2013/09/attaching-the-rocket-to-the-chainsaw—behind-the-scenes-of-bff-and-foes-crash-recycler.html

  20. bcs, I hear you, but the space of programs is so unimaginably vast that I really don’t think probabilistic searching is necessary. Also, the shape of the space tends to ensure that there are many ways to get to the same place, so backtracking isn’t necessarily profitable.

    Regarding saving variants generated along the way, as you suggest this is generally totally feasible.

  21. This is a side point, but:

    > You might be saying to yourself something like “it sure is stupid that John is passing an optimization flag to the preprocessor,” but trust me that it actually does change the emitted code. I didn’t check if it makes a difference here but I’ve learned to avoid trouble by just passing the same flags to the preprocessor as to the compiler.

    Well, “of course”: those trigger the definition of some macros (https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html), so that standard library headers can provide more or less optimizing definitions ^_^. Of course those definitions are supposed to be equivalent (that is, valid optimizations), but not necessarily for fuzzing.

  22. This is really intriguing. Now I’m pondering how to hook something like this up to llvm-stress and bugpoint for backend stuff.

    And please do file those clang/llvm bug reports (feel free to CC me on them if you like). I’d love to get all of them cleaned up.