The goal of testing is to expose bugs. To find a bug we must drive the software under test into a part of its state space where the bug becomes evident, for example by causing a crash. Many bugs lurk in dark corners of the state space that are ordinarily difficult to reach. This piece is about a nice technique for making those dark corners easier to reach using either random and non-random testing. Although the method is one that I have used before, I hadn’t really thought about it explicitly until a conversation with Colin Runciman the other day brought it out into the open.
The issue is the small scope hypothesis, which states that most bugs can be triggered using small test cases. Here we’re talking about a class of bug that falls outside of this hypothesis: these bugs are triggered when some implementation limit is exceeded. For example, if we have a bounded buffer with length 10,000, and this buffer contains a bug in its handling of the buffer-full case, then it is clear that no test case can trigger the bug unless it results in more than 10,000 insertions into the buffer. Similarly, a bug in an operating system’s swapping code will not be hit until we put memory pressure on the machine — this will be difficult if we’re testing on a machine with lots of RAM. A bug in a compiler’s register spill logic will not be hit until we run out of registers, which may happen only rarely on a RISC with a large register file.
The testing technique, then, is simply to recognize capacities in the system under test and to artificially reduce them in order to make capacity bugs much easier to trigger. I think that probalby every accomplished programmer has used this technique, perhaps without even thinking about it: a simple example is testing the code for an auto-resizing hash table by making its initial capacity very small. This is just good sense.
Of course, while some capacities appear as explicit constants in the code that can be easily changed, others, such as the number of registers available to a compiler, will tend to be encoded in a more implicit fashion, and will therefore be more difficult to find and change. Also, the fact that a constant is hard-coded into a program doesn’t mean this constant can be freely changed: in some cases the constant is fake (how many programs that reference the CHAR_BIT constant can really tolerate values other than 8?) and in others the program is only designed to operate for a certain range of choices. To continue with the compiler example, the calling convention (and other factors) will almost certainly cause failures when the number of registers falls below a certain number. Similarly, a pipe or a thread pool can easily cause the system to deadlock when it is instantiated with a capacity that is too small. These things are design constraints, not the implementation errors that we are looking for.
In summary, if you have a system that offers a choice of capacity for an internal resource, it is often very useful to reduce this capacity for testing purposes, even if this results in poor throughput. This effect is particularly pronounced during fuzzing, where the random walks through the state space that are induced by random test cases will tend to hover around mean values with high probability.
6 responses to “Testing with Small Capacities”
It’s absolutely essential to do this for serious file system testing. See 5.2 in our old ICSE 07 paper (http://www.cs.cmu.edu/~agroce/icse07.pdf)
Unfortunately, in my experience code built without explicitly thinking about doing this almost never lets you get away with it on anything non-trivial. I wonder if one way to ID where you can apply this would be using Daikon or something like it to find values that don’t ever get covered in a fairly weak set of test cases, then shrink the bounds to just over that and “push” on the edges. It’ll avoid the “code doesn’t really work at all if smaller than X” some, possibly?
Alex, I agree with all this. It’s part of the overall “design for testability” that everyone should be doing anyhow.
clang -mllvm -stress-regalloc=8
This reduces each register class to at most 8 registers. And yes, weird things happen when you use a too small value.
IIRC, I haven’t found any bugs in LLVM’s register allocator with this option. Probably because compiling for x86 has a similar effect, and most of the RA code is target independent.
I think some bugs would show up when targeting a 3-register machine, but there are too many expected failures due to ABI issues and inline assembly.
Thanks Jakob, I hadn’t seen that! Yes, x86 does make a good stress test for any kind of register allocator.
I have been using this technique for a while. I have my cut-offs and limits all parametric and I can adjust them from the command line. I then set these limits randomly (i.e. fuzz the limits themselves, too) from a fuzzer and let it run with some fuzz-examples. This way I can hit the limits much faster and test otherwise hard-to-test code paths.
Mate, nice!