A Few Thoughts About Path Coverage

Klee isn’t the only tool in its class, nor was it the first, but it’s open source and well-engineered and it works. Klee’s goal is to generate a set of test inputs that collectively induce path coverage in a system under test. One of the scenarios I was curious about is the one where Klee achieves full path coverage but still fails to expose a bug, for example because a stronger kind of coverage such as value coverage is required. Coming up with these examples is easy. Consider the following function for computing the number of bits set in a 64-bit word:

const int pop8[256] = {
  0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,
  1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
  1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
  2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
  1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
  2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
  2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
  3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
  1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
  2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 4, 5, 6,
  2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
  3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
  2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
  3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
  3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
  4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
};

int popcount (unsigned long long x) {
  int c=0;
  int i;
  for (i=0; i<8; i++) {
    c += pop8[x&0xff];
    x >>= 8;
  }
  return c;
}

This function admits only a single path of execution, and therefore Klee comes up with only a single test case: the 64-bit word containing all zero bits. This popcount function is, in fact, wrong (one of the values in the table is off by one) but Klee’s input does not trigger the bug. At this point we’re more or less stuck — Klee cannot find the bug and we’re back to code inspection or random testing or something. But what if we have a second popcount implementation?

int popcount_slow (unsigned long long x) {
  int c=0;
  int i;
  for (i=0; i<64; i++) {
    c += x&1;
    x >>= 1;
  }
  return c;
}

Unlike the previous implementation, this one is correct. Like the previous implementation, it has only a single path — so again Klee will give us just one test case. The way to find an input where the two functions return different answers is to just ask Klee directly:

assert (popcount(x) == popcount_slow(x));

Now there are two paths, a succeeding one and a failing one, and it takes Klee just a few seconds to find inputs that execute both paths. This is a pretty impressive result given the large input space. Of course, in this particular case random testing using uniformly-distributed 64-bit inputs finds this bug very quickly since approximately one in eight inputs triggers the bug. However, we could write code where the probability of a random input triggering the bug is arbitrarily small (note however that fooling a whitebox test case generator is not totally trivial: many of the tempting ways to add a bug would also add a new code path, which Klee would duly explore).

The interesting thing about path coverage is that it’s massively insufficient for finding all bugs in real code, and yet it is also impossible to achieve 100% feasible path coverage on real software systems due to the path explosion problem. The conclusion I draw is that future Klee-like tools should provide the option of targeting even stronger kinds of coverage (path + boundary value coverage, for example) and that these tools will be most useful in unit-testing of smallish codes.

The problem in testing my code above stems from its use of a lookup table. Perhaps people who test software for a living have come up with reasonably generic ways to detect problems in lookup tables, but I’m unaware of them. Notice that MC/DC coverage — used in validating Level A software in DO-178B, and one of the stronger coverage metrics that is actually used in practice — is not any better than path coverage for finding the popcount bug.

February 19, 2011

regehr

Computer Science, Education, Software Correctness

8 responses to “A Few Thoughts About Path Coverage”

Asankhaya Sharma says:

February 19, 2011 at 2:15 pm

This is interesting, the idea of of using path conditions from 2 programs such that one is a success and other failure has also been used to debug evolving programs in Darwin [FSE 09]
http://portal.acm.org/citation.cfm?id=1595704
Daniel Dunbar says:

February 19, 2011 at 8:36 pm

Nice post John!

The problem is that by itself symbolic execution is very limited in what it can determine to be a bug. Clearly memory errors and so on are bugs, but logical errors (such as what is in your popcount function) aren’t possible to detect without some kind of auxiliary specification (such as an alternative implementation).

You are certainly correct that this is one of the major limitations of symbolic execution. I think researchers tend not to focus on that particular area, because the core problem is already so hard.

In practice, it is hard to know how big of a problem this is. For example, if your popcount implementation was actually embedded in a program that used the result and needed it to be correct, then I think the hope would be that the fact that popcount fails would ultimately result in a test case showing coverage of that path.

I definitely think what you are asking for is an interesting area for future research though (and it’s one I’ve considered in the past). The idea I had was: when we go to generate a test case for any particular path, look for any places along that path where we could cause the path to diverge by some particular input, and generate test cases for all of those situations. This isn’t exactly what you are asking for, but it is something like ensuring good “fault injection” coverage.

One nice thing is that this is actually probably pretty tractable…

– Daniel
regehr says:

February 20, 2011 at 5:43 pm

Hi Daniel, I really like your idea. Vague ideas along these line are what got me started looking at Klee in the first place.

Overall I feel like path coverage is nice, but highly arbitrary. Ideally, we’d have a Klee that supports a variety of more and less aggressive policies that can be tailored to situations of interest.

Lately I’ve been playing with Astree, a sophisticated abstract interpreter. It is highly configurable based on the idea that you can tailor the analysis so that it succeeds in showing safety properties but doesn’t take too much time or memory to do so. Probably something similar is wanted in test case generators. Once the tool becomes good enough people become willing to tailor the system to the tool. Eliminating compiler warnings, even when they’re stupid, is a good example.
regehr says:

February 20, 2011 at 5:44 pm

Asankhaya, thanks for the reference. I think that equivalence checking has a bright future in software engineering.
Raimondas Sasnauskas says:

February 21, 2011 at 12:54 am

Hi Daniel, John,

The idea of “fault injection” is definitely worth pursuing. The really nice thing about it is that one has control and insight where the paths might diverge.

I tried the “blind” fault injection on the node/network level (packet drops, node reboots) which is quite inefficient leading to state explosion and only small path coverage gain. Consequently, I ended up in spending time on how to merge equivalent states and not on fault injection policies which is more interesting and useful.
Tony Garnock-Jones says:

February 21, 2011 at 3:30 pm

I was unaware of the existence of tools like Klee. It’s interesting to learn of its existence especially in the light of results like [1], where they achieve a completeness result for inferring types from Ruby code plus test cases so long as the cases cover all paths through each method tested. I wonder if a Klee-like program could be used to generate such test cases, or whether the combination of the test generator and the type inferencer would end up being circular 🙂

[1] An, D., Chaudhuri, A. Foster, J., and Hicks, M. 2011. Dynamic Inference of Static Types for Ruby. Proc. 38th ACM Symposium on Principles of Programming Languages, Austin, USA (POPL 2011).
Chan Li says:

February 23, 2011 at 3:40 pm

Daniel,

Could you please elaborate on “look for any places along that path where we could cause the path to diverge by some particular input”? I am not sure I get it.

Thanks,
Chan Li
Tao Xie says:

March 2, 2011 at 10:47 pm

Thanks for the interesting post!

One interpretation is that white-box testing techniques/tools like Klee or other code-based testing tools typically couldn’t detect faults related to missing functionalities. However, if general-enough assertions are used, 100% confidence on correctness with respect to the specified assertions could be set up, assuming that all paths ending with the assertion checking branch are explored. Please see our discussion in Section 3 of the following position paper. Comments are welcome!

Tao Xie, Nikolai Tillmann, Jonathan de Halleux, and Wolfram Schulte. Future of Developer Testing: Building Quality in Code.
In Proceedings of FSE/SDP Workshop on the Future of Software Engineering Research (FoSER 2010), Santa Fe, NM, pages 415-420, November 2010.
http://people.engr.ncsu.edu/txie/publications.htm