Software Testing Using Function/Inverse Pairs


One of the hardest problems in software testing is finding convenient, strong oracles—programs that can tell us whether the software under test worked correctly. Weak oracles, such as checking to make sure a program doesn’t crash, are always available, but not always of much use. An example of a strong oracle would be a reference implementation.

Sometimes software comes in function/inverse pairs, meaning that we get a program implementing some function f and another that implements f-1. When this happens we can (ideally) make up an input x and then check to make sure that f-1(f(x))=x. Often this works only in one direction. For example, let’s consider an assembler/disassembler pair. If I take some arbitrary assembly language, assemble it, and then disassemble the resulting object code, I’m unlikely to get exactly what I started with. On the other hand, if I take some arbitrary object code, disassemble it, and then assemble the resulting assembly code, I should get back the same object code. The question is whether or not f or f-1 is prepared to accept inputs in non-canonical forms. This problem can usually be solved by running a second cycle through the system. For example, Jesse Ruderman of jsfunfuzz fame reports taking JavaScript code in non-canonical form, compiling and decompiling it, and then doing the same thing again. If the second and third versions of the program differ, a bug has been found. I don’t know how many of the impressive 1500+ bugs found by jsfunfuzz come from this method.

Here’s a list of common function/inverse pairs that I came up with:

  • pickle/unpickle, or save/load, of in-memory data
  • checkpoint/restore of process or virtual machine state
  • assemble/disassemble and compile/decompile
  • encode/decode of data for transmission
  • encrypt/decrypt
  • compress/decompress, either lossless or lossy

The thing I would like to learn from you readers is what other function/inverse pairs you can think of. I feel like I must be missing many.

This topic was suggested by Alastair Reid who additionally suggested that an assembler/disassembler pair for an architecture such as ARM could be exhaustively tested in this fashion since there are only 232 possible instruction words. Very cool!


12 responses to “Software Testing Using Function/Inverse Pairs”

  1. Suppose that you’re testing a static analysis tool. For some abstract domains, you might be able to test that the concretization function inverts the abstraction function.

    Similarly, for some semantics, you might be able to test implementations of predicate transformers, in particular weakest precondition and strongest postcondition.

  2. The following come to mind:

    invert/invert: matrices and other numerical structures

    send/receive: testing client-server systems

    translate/translate back: natural languages (lossy!)

    wrap/unwrap: packets, furniture, eBay merchandise, etc.

    build/clean up: making sure that all the products of some operation are under control (make/make clean)

    shuffle/sort

    distribute/merge

    capture/model: build 3D models from photos and such

  3. The sounds-like tool works by converting letter sequences to phonemes and back to letter sequences again. The phoneme to letter sequence conversion is often not unique, so there is no fixed point.

    phonetisaurus does text to phoneme conversion using a Markov model and I had an email discussion with the author who automatically inverted the model to produce a phoneme to text conversion model.

  4. Any signal processing or packet processing is likely to have an inverse.

    These are bit tricky because forward error correction has a tendency to hide bugs!

  5. There’s a general mathematical framework for this called a (Galois) reflection which describes two languages S,T with:
    – An ordering on equivalent terms such that a<= b if a is more canonical than b
    – Transformations f from S to T and g from T to S such that f(g(t)) = t and g(f(s)) <= s

    I've used this in papers on visual tracking which involves functions from images to models (i.e., take a camera image and locate a target image within it) and from models to images (i.e., scale, rotate and translate the target image to generate a prediction of what the camera image should look like). We mostly used the reflection model for guidance when designing new trackers because it turns out that the moment you put a video camera or a robot in the loop, you get more fuzz testing than you can deal with!

    The best description I've seen for reflections is by Phil Wadler (section 3 of http://homepages.inf.ed.ac.uk/wadler/topics/call-by-need.html#reflection) who related it to decompilation.

  6. John,

    The idea of exhaustively testing all 2^32 opcodes is more than just a theoretical – it is part of overnight testing of our assembler/disassembler here at ARM.

    We also exhaustively test LLVM’s MC layer against our disassembler. See the MC Hammer slides (http://llvm.org/devmtg/2012-04-12/Slides/Richard_Barton.pdf) from the European LLVM conference.
    Slide 27 is especially interesting: 18% of ARM instructions were being incorrectly assembled by LLVM’s MC layer.

  7. Financial institutions often calculate rates or volatilities of assets from the prices paid in the market for derivatives based on the assets.

    E.g. price of an option = X:
    X –> volatility calculator –> volatility = V
    Then with any luck
    volatility = V –> pricing model –> price = X

  8. Inverses are useful, of course, for testing the math part of a tool chain, as are more general identities, though you may need a fudge factor. I found a bunch of bugs via random identity-based testing in the PalmSource ARM compiler, e.g., in our variant of the FDLIBM hypotf. I mention this briefly in my _Software: Practice And Experience_ article, though without any of the useful details. (Which were pretty straightforward, as I recall—just set up the obvious identities, find an idle machine(s), and turn it loose.)

  9. View update in OLAP applications

    The user interface displays a value (e.g. total sales for all stores in 2011) that is calculated out of base data (sales for each individual store). The user can specify a new target value, and the system is expected to update the base data so that recomputing the formula will yield the value the user specified.