Comments on a Formal Verification of PolarSSL


The C language has given the world many enduring gifts such as buffer overflows, uninitialized variables, and use-after-free errors. Since rewriting a code base in some other language is not easy, we’re often stuck trying to eliminate bugs in legacy C before they bite us, and of course bugs in network-facing code sometimes bite us very hard. In this post I’ll be discussing TrustInSoft’s Verification Kit for PolarSSL 1.1.8, which I’ll call “the report” throughout this post. Just to be clear, I’m not unbiased — I’m working for these folks this year — so I’ll try to keep the cheerleading to a minimum.

The crux is:

This report states the total immunity of the SSL server implemented by the PolarSSL 1.1.8 library to [CWE-119, CWE-120, CWE-121, CWE-122, CWE-123, CWE-124, CWE-125, CWE-126, CWE-127, CWE-369, CWE-415, CWE-416, CWE-457, CWE-476, CWE-562, and CWE-690] if it is deployed according to the Secure Deployment Guide detailed in section 4.4.

In other words, a collection of nasty undefined behaviors that are so hard to get rid of, have been gotten rid of — if you are careful to follow the instructions.

The thing that struck me while reading the report is that I’ve never seen a document quite like it. I’ll next try to nail down that idea by explaining what the report is not.

It’s not a piece of formal methods research, which would need to demonstrate the novelty and utility of a verification technique. Rather, the verification techniques are completely deemphasized in the report in order to concentrate on the code that is being verified. In fact, one of the things that bothers me about formal verification research is that so much of it is concerned with new notations and clever algorithms instead of focusing on the thing I’m interested in: the interplay between the verification techniques and the code being verified.

It’s neither a formal proof nor a code review. Rather, nearly all of the reasoning about code has been delegated to the analysis tool, which must be trusted to come to the right conclusions. Why would we trust a formal verification tool? For one thing, the analyzer has been validated on many other codes including programs generated by Csmith (we wrote a short paper about that work). For another, the analyzer identified some legitimate shortcomings in PolarSSL that needed to be fixed before the verification would go through (these are in Section 4.4.3). Additionally, a much more serious PolarSSL bug was found earlier in the verification process. It is not mentioned in this report since it had been fixed before version 1.1.8 was released.

Because it makes a guarantee, it is fundamentally different from a report based on unsound static analysis. By early 2014, OpenSSL had been examined by various unsound analyzers and yet none of them found Heartbleed. In contrast, if someone had used sound static analysis to verify OpenSSL 1.0.1f, it is a certainty that Heartbleed would have been among the defects that were discovered. (We have to be a little bit careful when making this kind of statement. For example, the bug won’t be discovered if the OpenSSL being verified was compiled using -DOPENSSL_NO_HEARTBEATS. The buggy code must lie within the verification perimeter that I’m about to discuss.)

So what is this report? It is, mainly, three things:

The definition of a perimeter within which its claims apply. This perimeter has three aspects. First, it excludes some parts of PolarSSL, such as X.509, that were not verified. Second, it excludes some ways in which the code could be used by defining drivers that exercise API calls in certain ways. For example, the server driver that spans pages 16 and 17 initializes the library, makes an arbitrary number of read and write calls, and then shuts it down. The guarantees in the report would not apply to a different usage, such as one that started writing data without properly initializing the library. Nor would they apply to usage of the library by a buggy program that corrupted PolarSSL’s state using a stray indirect write. Third, some configurations of the code are placed outside of the perimeter. The issue is that C code is implicitly parameterized by a collection of implementation defined behaviors, such as the representation of integers, and it is also typically explicitly parameterized by preprocessor directives. A real C program is actually a shorthand for a huge family of different programs; any formal verification effort either has to pin down which member of the family is being verified or else reason about the entire family. Reasoning about the entire family in a sound and precise way is an open problem.

A sketch of the verification strategy, which is divide and conquer. Each of chapters 5-10 describes the verification of a component. The interfaces between components are specified in ACSL, which is designed in such a way that when two components are verified against opposite sides of an interface, their overall verification can be taken for granted.

A justification of alarms. The analysis tool signals an alarm any time it fails to prove that an operation (for example, incrementing a signed integer) is safe. Each alarm must be investigated to determine if it is a true alarm (exposing a defect in the code) or a false alarm (exposing an imprecision in the analysis). Appendix A of the report contains this analysis. It turns out to be easy, in theory, to avoid the imprecisions that caused these alarms: the necessary techniques have been known for decades. The issue is that there are engineering reasons for avoiding these static analysis techniques: their results can be very difficult for people to understand and also they can be computationally expensive. An entertaining exercise would be to pick a few parts of Appendix A, download PolarSSL 1.1.8, and make sure you agree with the reasoning from the report.

That’s all there is to it, though if you read the report you’ll see that the reasoning sometimes becomes quite detailed and technical.

Finally I’ll try to anticipate a few questions the you might have at this point.

How much effort would be required to track the upstream PolarSSL (now mbed TLS)? A majority of local changes will just go through the verification automatically. Larger-scale changes might raise one or more new alarms that need to be investigated.

Under what conditions does the claim of “total immunity to UBs” apply? In addition to the conditions described in the report, the intellectual analyses in Appendix A must be correct, the verification tool must not have made a mistake, and the C compiler and the platform (OS + standard libs + hardware) must not make mistakes.

If we don’t want UBs, why not just use a TLS written in Java or C# or some other safe language? Of course that’s a fine idea! But we’ll have to somehow shoehorn a Java runtime onto those embedded processors with ~64 KB of RAM where PolarSSL / mbed TLS run. Also, I’m not sure that it is clear how to implement crypto in higher level languages in such a way that it resists timing attacks (this is hard enough in C). Finally, safe languages are only dynamically safe: a buffer overrun bug in a Java program will lead to an exception, and perhaps to a DoS attack. The report makes a guarantee of static safety, meaning that buffer overrun bugs do not exist.

Where is the corresponding document for OpenSSL? It doesn’t exist. Based on lines of code, it would probably be about four times longer than the PolarSSL report, and would require at least four times the effort to create.

What is the future of this kind of report? It’s all about economics. Given a piece of C code, we can choose to inspect it, to test it by hand, to fuzz it, to subject it to unsound static analysis, and to subject it to sound static analysis. Obviously, all of these validation techniques have merit — the question is how many resources to allocate to each of them. Larger, more rapidly evolving code bases tend to favor cheaper validation methods whereas smaller, security-critical codes are amenable to formal verification. The reach of formal verification techniques is increasing as techniques improve, but also software keeps getting bloatier.

How should I think or act differently based on the information in this report (and others like it)? Well, this is the central question. Formal verification has made major progress in the last 10-15 years and we now have extremely impressive technical results including CompCert and seL4. However, I think it’s fair to say that almost everyone is having trouble figuring out how to fit these results into the big picture. From the client’s point of view, formal verification tends to be somewhat mysterious, as if a mathematically adept wizard had sprinkled some magic verification dust over a source code repository, which now glows with correctness. Let’s take a quick look at this article about seL4 that appeared in New Scientist a few days ago. On Twitter, Gerwin Klein — the leader of the L4.verified project — said:

The sad truth is that this article is not that bad compared to others. I’ve seen way worse. Nuance is hard..

What he says is true: the article, while being sort of decent, is just extremely far from conveying accurate technical information. Here’s a more nuanced version.

The PolarSSL report can be best understood as providing details that are necessary for the next level of people to crack open the black box surrounding this particular formal verification effort. We didn’t need another article about abstract interpretation, we needed specific technical claims about a particular piece of source code, followed by details that back up the claim. Much more work on communicating verification results is needed, but this is a good piece of progress.

,

15 responses to “Comments on a Formal Verification of PolarSSL”

  1. > a buffer overrun bug in a Java program will lead to an exception, and perhaps to a DoS attack

    If there is no state shared between connections only one connection will be affected. Minimizing shared state should reduce this risk a lot.

    Exception handling in safe languages is … safe. There is no risk of corrupting anything.

    If someone writes a TLS implementation in a safe language the coding standards probably should include minimizing shared state. So no buffer pools, caches etc. that are not obviously correct.

  2. There are, in fact, TLS implementations in safe languages. See for example OCaml-TLS, a TLS implementation in OCaml by David Kaloper-Meršinjak and Hannes Mehnert: https://github.com/mirleft/ocaml-tls. (It uses C implementations of the cryptographic primitives, in part because of the side-channel issues mentioned in this blog post.)

    One bitter-sweet result of the laudable effort to develop sound code analysis for unsafe languages such as C is that the effort concentrated there (understandably: it can reach most critical software written in the past or even today), and that we have few advanced static analysis tools for the arguably safer functional languages (Java/C# would be better equipped, but still marginally so). We don’t have the tooling available today to do the same sort of analysis on OCaml-TLS than what was done on PolarSSL using Frama-C.

    (There is interesting work to be done to improve the safety of the runtime of those languages, which is part of the trusted base; John was kind enough to use some of his instrumentation work on the C runtime for OCaml ( http://blog.regehr.org/archives/903 ), and I think there is interesting work to be done in this area, for example proving the memory-manipulation code of the C runtime safe with respect to the OCaml memory model.)

  3. There is a very obvious question that you didn’t answer: If there’s been a formal verification that “proofs” that polarssl is free of typical C bugs how is it possible that in January a potential remote code execution was found in it? [1]

    I think I can guess the answer to that question. The RCE was in the ASN1 parser code, which is part of the X.509 parser and therefore was out of scope for this analysis.

    However I might ask the next question: If one of the most vulnerable parts of the code was excluded – what’s the point of the report in the first place?

    I must say that my personal background is that I mostly do things that formal verification people would probably consider as “not that interesting” (in other words: fuzzing). I applaud efforts to improve the state of formal verification, but at least from my outsider’s view it looks like these methods do what they are said to do, but on a closer look the claims that are prooven are so limited that they hardly matter in practice. The polarssl report matches that pattern.

    Michal Zalewski wrote a very interesting comment on this a while ago [2].

    I think this is really the challenge for the field of formal verification: Can you make your methods so practical that they really matter?

    Right now it looks to me that using much simpler methods – fuzzing, gcc/clang’s -fsanitize features etc. – is vastly more effective in finding real and important bugs. I would like to be convinced that formal verification can play a bigger role in the future of IT security research.

    [1] https://tls.mbed.org/tech-updates/security-advisories/polarssl-security-advisory-2014-04
    [2] http://lcamtuf.blogspot.de/2015/02/symbolic-execution-in-vuln-research.html

  4. > If one of the most vulnerable parts of the code was excluded – what’s the point of the report in the first place?

    Hanno, if the bugs that were identified and reported during this study [1] were not worth finding, then none of what you do as part of the Fuzzing project has any value either. The bugs you report yourself? The tutorials you write so that people can report even more of them? All these direct and indirect bugs reports seem to have the same distribution of some exceptional, obviously serious stuff (CVE-2013-5914, [2]) and quantities of mundane defects that the maintainers may or may not find worth fixing at all, and that might yet have consequences.

    Neither John or I ever said that fuzzing wasn’t a valuable technique, quite the contrary (since you like answering questions, perhaps you can identify in the blog post the hint as to why this would be out of character for us?).

    > Right now it looks to me that using much simpler methods…

    The difference between formal methods and fuzzing if that with formal methods, you know when you are done. Fuzzing is fine when you are finding bugs as a hobby, or when your employer largely leaves you free to find the vulnerabilities you want as long as they are in competitors’ code. The problem that some people face is that they more constructively want to embed code that will have to turn out to be free of buffer overflows for the year to come and for various technical reasons, updating the software as problems get reported will not be an option. Formal methods can solve these people’s problem.

    Meanwhile, if you are happy with opportunistic bug-finding, by all means keep fuzzing. Actual bugs (true positives) are really annoying when trying to formally verify existing software: one needs to convince oneself that the warning is not a false positive for deep reasons before convincing oneself—and building an argument for the software’s maintainer—that it is a true positive. So all the opportunistic bugs you identify and report are as much time gained when the inclination appears to make sure that the software you already fuzzed is in fact bug-free.

    Oh, and the X.509 part of PolarSSL that is so crucial to security? A couple of hundred thousand euros and it’s a done thing (TrustInSoft will throw in a few additional ciphersuites for that price, too). Yes, it’s more expensive that downloading afl for free and writing a couple of test drivers, but as another comparison point, a typical EU-funded collaborative research project costs ten times as much, and for the price you do not always get the equivalent of a buffer-overflow-free, C, lightweight, already widely used TLS implementation at the end of it.

    Formal methods as applied in this study are more expensive than some things and less expensive than some other things. And they answer some needs that aren’t addressed otherwise.

    [1] https://github.com/ARMmbed/mbedtls/blob/development/ChangeLog
    [2] http://trust-in-soft.com/the-sociology-of-open-source-security-fixes-continued/

  5. Hi Hanno! You don’t need to guess the answer to your question — both my post and the report state that X.509 wasn’t verified.

    Also, as I imagine you are well aware (though your comment doesn’t demonstrate this) the point of formal verification isn’t finding bugs. As you say, if finding bugs is the goal then we would approach the problem differently.

  6. Hi Pascal,

    Reading your answer makes me think my comment may have been to harsh in tone. That wasn’t my intention. But I wanted to give a perspective that I think many from the security community share.
    My strong impression is that currently the output of relevant bug reports from formal verification seems very low. I would love to see that change and I would welcome the perspective of being able to declare certain code free of certain bug classes. If the answer right now is that this is only possible with a six-figure price tag then it’s probably something only feasible for a very small minority of projects.

    (by the way – I’d be much more interested in seeing that applied to openssl than polarssl – but I know you’re already working on that in parts)

  7. Hi Hanno, thanks for clarifying. We just get frustrated by reactions like yours because we spend a lot of effort trying to be clear that we understand the tradeoffs between fuzzing and verification and even so, choose to work on verification (sometimes, at least) because it fills a different niche where maximizing bugs-per-$$ isn’t the goal.

  8. Whenever I hear formal verification I remember the presentation of a professor that applied at our university. He showed that with his tools his modulo 3000 operation would always return numbers <=3000. He didn't get the position.
    Verifications are just as prone to errors as the code they check. It just slightly reduces the chance of doing the same mistake twice because you have to re-state the problem in a different way.

  9. Hi Karsten, of course verification tools can contain bugs, but “slightly reduces the chance of doing the same mistake twice” is not a good summary of the issues that are in play here. Did you take a look at the linked paper where we used random testing to look for bugs in the verification tool?