Category: Software Correctness

  • A Few Thoughts About Path Coverage

    Klee isn’t the only tool in its class, nor was it the first, but it’s open source and well-engineered and it works. Klee’s goal is to generate a set of test inputs that collectively induce path coverage in a system under test. One of the scenarios I was curious about is the one where Klee…

  • Undefined Integer Behaviors in Student Code, Part 1

    [This post is based on material from Chad Brubaker, a really smart CS undergrad at Utah who did all the work getting these data. The integer undefined behavior checker was created by my student Peng Li.] Integer undefined behaviors in C/C++, such as INT_MAX+1 or 1<<-1, create interesting opportunities for compiler optimizations and they also…

  • Should Software Evolve?

    Weimer, Nguyen, Le Goues, and Forrest wrote a paper Automatically Finding Patches Using Genetic Programming that proposes and evaluates this approach for fixing a buggy software system: A failure-inducing test case is found The system code is mutated randomly, but using smart heuristics to increase the probability that the part of the program that is…

  • Random Testing Gets No Respect

    A funny thing about random testing is that it gets little respect from most computer scientists. Here’s why: Random testing is simple, and peer reviewers hate simple solutions. Random testing does not lend itself to formalization and proofs, which peer reviewers love. Random testing provides no guarantees. Peer reviewers love guarantees — regardless of whether…

  • Two Rules for Random Testing

    When you run a test case on a piece of software, you’re conducting an experiment where the null hypothesis is “the system under test correctly executes this test.” The problem is that the value of each such experiment is small because there are so many possible inputs. Random testing has a stronger null hypothesis: “the…

  • Probabilities in Random Testing

    A typical real computer system has an extremely large input space and no testing method can cover more than an infinitesimal part of it. On the other hand, broad regions of the input space will trigger bugs. The problem is that we do not know the shapes or locations of the buggy parts of the…

  • Csmith @ PLDI

    My group’s paper Finding and Understanding Bugs in C Compilers was accepted to PLDI 2011 (subject to shepherding, which is generally a pretty calm process). I’m excited about this since we’ve put a lot of effort into this project. Thanks to all the people who helped us make this work better!

  • Who Verifies the Verifiers?

    Verification tools, like all complex software systems, contain design flaws and implementation errors. Moreover, they are not necessarily very easy to test. The other day I started to address the question: If someone claims that a piece of software has been formally verified, what should you believe about the software? The previous piece was non-technical;…

  • The Piano Test for Program Verification

    [Update from Feb 1 2011: I’ve written a new post that adds some technical details.] Here’s a little thought experiment: You’re spending the day visiting the Utah computer science department. We’re chatting and I describe a software verification project where my group has proved that some particular version of OpenSSH is memory safe: it will…

  • The Synergy Between Delta Debugging and Compiler Optimization

    Before reporting a compiler bug, it’s best to reduce the size of the failure-inducing input. For example, this morning I reported an LLVM bug where the compiler enters an infinite loop when compiling this C code: static int foo (int si1, int si2) { return si1 – si2; } void bar (void) { unsigned char…