When Will Software Verification Matter?


When will average people start to use formally verified software? Let me be clear right from the start that I’m not talking about software that has been subjected to static analysis or about software that implements proved-correct algorithms, but rather about software whose functional correctness has been verified at the implementation level in a way that might pass the piano test. As far as I know, such software is not yet in widespread use — if anyone knows differently, please let me know.

The problem isn’t that we cannot formally verify software, the problem is that at present the cost/benefit ratio is too high. Let’s look at these in turn. First, how much does full formal verification (i.e., a machine-checked correctness proof of an actual implementation, using believable specifications) increase the cost of development? For many projects, the multiplier might as well be infinity since we don’t know how to verify something like MS Word. For other projects (most notably L4.verified and CompCert, but there are plenty of other good examples) the cost multiplier is probably well in excess of 10x. Next, how much benefit is gained by formal verification? This is harder to estimate, but let me argue by analogy that the value implicitly assigned to software correctness is very low in most circumstances. For example, compilers (empirically) become more likely to miscompile code as the optimization level is increased. But how much of the software that we use in practice was compiled without optimization? Almost none of it. The 2x or 3x performance gain far outweighs the perceived risk of incorrectness due to compiler defects. The only domain that I am aware of where optimization is routinely disabled for correctness reasons is avionics.

The cost/benefit argument tells us, with a good degree of accuracy, where we will first see mass use of formally verified software. It will be codes that:

  • have very stable interfaces, permitting specification and verification effort to be amortized over time
  • are amenable to formal specification in the first place
  • are relatively small

The codes that come immediately to mind are things like crypto libraries, compression and decompression utilities, and virtual machine monitors. I mean, why hasn’t anyone verified gzip yet? Is it really that hard? What about xmonad, can that be verified? A verified virtual machine monitor is clearly more difficult but this is a very high-value domain, and there are several such projects underway.

In all cases above — hypervisor, gzip, OS kernel, compiler — the question I’m trying to get at isn’t whether the artifacts can be created, because they can. The question is: Will they see widespread use? For this to happen, the verified codes are going to have to be relatively fully-featured (right now, most projects would never choose CompCert over GCC due to missing features) and are going to have to perform on a par with the best non-verified implementations.

What I suspect will happen is that we’ll start to see little islands of verified programs floating in the larger ocean of non-verified code. For example, I’d be happy to drop a proved-correct gzip into my Linux machine as long as it was respectably fast. Ideally, these islands will spread out to encompass more and more security-critical code. I think we are on the cusp of this starting to happen, but progress is going to be slow. I worry that creating these fully-featured, high-performance verified programs is going to require a huge amount of work, but that it will not be particularly novel (so the research community won’t care about it) or commercially viable. These problems go away only if we can dramatically reduce the cost of implementing verified programs, or (less likely!) of verifying implemented programs.


13 responses to “When Will Software Verification Matter?”

  1. In the open source world software gets “verified” by the feedback (bug-reports) of its users over the years.
    This feedback is accumulated into huge test-suites.
    (For example compression utilities can have test-suites of several terabytes in size)

    I would argue that at the moment formal verification (Coq proof, etc.) only makes sense for projects with a very small number of users (space-probes, etc.) that cannot send feedback easily, or where every bug is potentially fatal.

  2. This is a very smart thought.

    I can see especially high value in verification of file formats. Parsers are known to be responsible for a significant percentage of all real-world security bugs.

    Sanitizing untrusted input is important. It’s like making sure the front door is 100% secure so you don’t need full security in the whole building.

  3. Something like libgzip would be a great warmup for the enormous value in tackling the common JPEG, PNG, etc. libraries which sporadically require updates in thousands of otherwise unrelated applications

  4. This and the piano testing post got me thinking, specifically about adversarial verification. Biology is perhaps the oldest information and mechanical systems we have, so it is likely to have overcome some of the issues. Most key would be the genetic encoding of an object (text files) and the physical manifestation (execution environment).

    Adversarial verification seems much like viral action, and the life/death cycle would be equivalent to testing what you fly and flying what you test. After billions of years the best nature has come up with is two of each vital organ and a diversity such that any pathology is restricted to an acceptable fraction of the population at anyone time. There can never be one viral construct to rule them all, so saying that I would like to see a significant diversification of implementations.

  5. Hi Frank, I think there are definitely strong analogies to be made between software ecosystems and real ecosystems.

    The problem with diversity is, where does it come from? Various efforts to incorporate diversity automatically exist, and some of the results are nice. However, there are strong limits on what can be accomplished if you want to avoid breaking the correctness of your software while adding diversity.

    A non-automated approach to diversity, n-version programming, was famously shown to not necessarily constitute as strong of an argument as its proponents had hoped:

    http://sunnyday.mit.edu/papers/nver-tse.pdf

  6. Verifying code after it has been written is far more expensive than verifying the specification and then the code at the time of writing. Carefully designed specifications and code that is verified in the first place will have fewer defects and the cost of verification will be much smaller since you’re building on top of the existing verification.

    In all of your cases you wish to verify the code/spec after it’s been written. That’s an exercise only for academics/researchers to do since they can justify the time spent on it. Especially if the code is written in an unsafe language such as C or C++. Huge amounts of time will be wasted verifying that buffer overflows don’t happen.

    Software verification will matter only when we aren’t striving to write code as quickly as possible and choose to write better specifications and choose to set a minimum standard of quality for *all* programs.

  7. Almost any software depends on external libraries, it’s almost impossible to write specs that completely describes all possible cases for all combinations of libraries/versions. That’s really boring academic activity—write down thousands of possible scenarious, decoupling logic etc.

    You may ask: why libraries? Well, take a look at typical “Hello, world”, it consists of just a few lines of source code, it’s defenitely “correct”, but still not reliable because of libraries it uses. So to gain practical impact you must dive deeply into the libraries→compilation tools→OS→CPU. So I don’t think it’s worth.

  8. I have a feeling that formal verification is a lot like AI: if a problem is hard (requires a PhD in AI to solve) then it is AI; as soon as that problem becomes easy/automated, it ceases to be an AI problem. This is just sound marketing: like renaming NMR scans as MRI scans because patients don’t like the N-word.

    Watching formal verification tools being rolled out to hardware engineers, it feels like it is 50% marketing. We try to explain things in terms of coverage of equivalent tests. We relabel formal verification as ‘static testing’. We don’t mention the F-word.

  9. Software verification has always mattered. However, the rigor, scope, and intensity of software verification depends on the nature and usage of the software. A software library used in a nuclear facility should be subjected to a different verification process than if the same library were to be used in a laundry facility. One cannot simply say the software library is verified. One has to state the limits on the usage for which the software is verified.

  10. Maybe a corollary question: Say you have a program that’s written against some number of libraries, all of which are verified by the standards you describe. What additional cost might it be to verify your program? My guess, or hunch, is that the individual proofs for each library’s correctness become axioms for the proof of your program’s correctness, such that the cost relates only to the size of your code… But I really have no idea. Any thoughts on that?