Non-Transparent Memory Safety


[This paper contains more detail about the work described in this post.]

Instrumenting C/C++ programs to trap memory safety bugs is a popular and important research topic. In general, a memory safety solution has three goals:

  • efficiency,
  • transparency, and
  • compatibility.

Efficiency is obvious. Transparency means that we can turn on memory safety with a switch, we don’t have to do anything at the program level. Compatibility means that safe and unsafe code can freely interact, especially when linking against libraries. Compatibility is tricky because it severely limits the ways in which we can change the layout of memory objects, as we might hope to do in order to store the length of an array along with its data.

One of my favorite memory safety solutions for C — the Deputy project from Berkeley — is distinct from most other work on this space because it does not have transparency as a goal. While this initially seems like a bad idea, and it will obviously limit the amount of legacy code that we can run under Deputy, I eventually came to realize that non-transparency can be a good thing. The goal of this piece is to explain why.

When you write a C or C++ program, you usually intend it to be memory safe. And in fact, a large proportion of C/C++ code in the wild is memory safe, meaning that for all valid inputs it fails to access out-of-bounds or unallocated storage (or it might mean something else, but let’s not worry about that). The problem, of course, is that a small fraction of C/C++ code is not memory safe and some of these errors have serious consequences.

For sake of argument, let’s say that you have written a piece of C code that is memory safe. With some effort you can do this for a small and perhaps for a medium-sized program. Now we might ask: Why is the program memory safe? Where does the memory safety live? Well, the memory safety resides in the logic of the program and perhaps also in the input domain. Unless we’ve used some sort of formal methods tool, the reasoning behind memory safety isn’t written down anywhere, so it’s impossible to verify.

Let’s take your memory safe C program and run it under a transparent memory safety solution like perhaps SoftBound + CETS. What we have now are two totally separate implementations of memory safety: one of them implicit and hard to get right, the other explicitly enforced by the compiler and runtime system.

Deputy is based on the premise that we don’t need two separate implementations of memory safety. Rather, Deputy is designed in such a way that the C programmer can tell the system just enough about her memory safety implementation that it can be checked. Let’s look at an example:

int lookup (int *array, int index) {
  return array[index];
}

If we don’t trust the developer to get memory safety right, we need to change the code to something like this:

int lookup (int *array, int index) {
  assert(index >= 0 && index < array.length);
  return array[index];
}

In the C programmer’s implementation of memory safety, the assertion is guaranteed not to fire by the surrounding program logic and by restrictions on the input domain. In a compatible memory safe C, the assertion must be statically or dynamically checked, meaning that we need to know how many int-typed variables are stored in the memory region starting at array. This is not so easy because C has no runtime representation for array lengths. The typical solution is to maintain some sort of fast lookup structure that maps pointers to lengths. A significant complication is that array might point into the middle of some other array. The code that actually executes would look something like this:

int lookup (int *array, int index) {
  check_read_ok(array + index, sizeof (int));
  return array[index];
}

Getting back to Deputy, the question is: How can the programmer communicate her memory safety argument to the system? It is done like this:

int lookup (int *COUNT(array.length) array, int index) {
  return array[index];
}

COUNT() is an annotation that tells Deputy what it needs to know in order to do a fast bounds check — no global lookup structure is necessary.

When I first saw the example above, I was not very impressed: it looks like Deputy is just being lazy and punting the problem back to me. But after using Deputy for a while, its genius became apparent. First, whenever I needed to tell Deputy something, the information was always available either in my head or in a convenient program variable. This is not a coincidence: if the information that Deputy requires is not available, then the code is probably not memory safe. Second, the annotations become incredibly useful documentation: they take memory safety information that is normally implicit and put it out in the open in a nice readable format. In contrast, a transparent memory safety solution is highly valuable at runtime but does not contribute to the understandability and maintainability of our code.

There are a number of other Deputy annotations, most notably NTS which is used to tell the system about a null-terminated string and NONNULL which of course indicates a non-null pointer. The Deputy Quick Reference shows the complete set of annotations and the Deputy Manual explains everything in more detail and has code examples. The Deputy paper focuses on more academic concerns and unfortunately contains only a single short example of Deputized C code.

Although the preceding example didn’t make this clear, applying Deputy to C code is pretty easy because the Deputy compiler uses type inference to figure out annotations within each function. Thus, many simple functions can be annotated at the prototype and the compiler takes care of the rest. In more involved situations, annotations are also necessary inside functions. The process for applying Deputy to legacy C code is to compile the code at which point Deputy says where annotations are missing. So you add them and repeat. It’s a nice process where you end up learning a lot about the code that you are annotating. In general, an incorrect annotation cannot lead to memory-unsafe behavior, but it can cause a memory safety violation to be incorrectly reported. (You can write truly unsafe code in Deputy using its UNSAFE annotation, but at least the unsafe code is obvious, as it is in Rust.) My guess is that people who enjoy using assertions would also enjoy Deputy; people who hate assertions may well have a different opinion.

Is Deputy perfect? Certainly not. Most seriously, it is only a partial memory safety solution and does not address use-after-free errors. Its memory safety guarantee does not hold if there are data races. One time I ran into a case where Deputy wouldn’t let me tell it the information that it needed to know, I believe it was when the size of an array was in a struct field. Finally, since it is based on CIL, Deputy supports C but not C++.

My group used Deputy as the basis for our Safe TinyOS project. TinyOS was a nice match for Deputy: the extremely lightweight runtime was suitable for embedded chips with 4 KB of RAM and the lack of use-after-free checking wasn’t a problem since TinyOS doesn’t have malloc/free. We found that in many cases it was sufficient to annotate the TinyOS interface files — which serve much the same role as C header files — and then Deputy didn’t need additional annotations. Here’s an example of an annotated interface:

/**
 * @param  'message_t* ONE msg'        the received packet
 * @param  'void* COUNT(len) payload'  a pointer to the packet's payload  
 * @param  len                         the length of the data region pointed to by payload
 * @return 'message_t* ONE'            a packet buffer for the stack to use for the next
 *                                     received packet.
 */  
event message_t* receive(message_t* msg, void* payload, uint8_t len);

There are minor differences from standard Deputy, such as ONE pointers (they “point to one object”) instead of SAFE NONNULL, and we put the annotations into the comments, so they automatically get added to the interface documentation, instead of putting them directly into the function prototypes. There were also some changes under the hood. We found that Deputy was generally a pleasure to use and it caught some nasty bugs in various TinyOS programs.

The current status is that Deputy has not been supported for some time, so it would not be a good choice for a new project. The Deputy ESOP paper has been well cited (114 times according to Google Scholar) but the basic idea of memory safe C/C++ via annotations and type inference has not caught on, which is kind of a shame since I thought it was a nice design point. On the other hand, even if an updated implementation was available, in 2014 I would perhaps not use Deputy for a new safe low-level project, but would give Rust a try instead, since it has a good story not only for out-of-bounds pointers but also use-after-free errors.


14 responses to “Non-Transparent Memory Safety”

  1. Shameless plug, but you might find the following interesting: http://dl.acm.org/citation.cfm?id=2524234

    It uses dynamic binary translation (similar to Valgrind) to check properties (e.g. bounds) on all accesses through tainted pointers. Whereas Valgrind maintains bit- or byte-granularity shadow memory, the watchpoints framework modifies / taints the addresses returned by memory allocators so that they contain identifying information. When memory is accessed, the address is checked to see if it is tainted, and if so, performs some kind of action (e.g. bounds checking).

    The nice thing with this solution is that:
    i) It works for arbitrary pointers, even if those pointers point inside of some larger object. This is because the identifying information is stored in the high-order bits, and unlikely to be modified by typical address displacements.
    ii) It doesn’t require the source code to be annotated or compiled in any specific way.
    iii) It doesn’t actually require comprehensive instrumentation. That is, not all code needs to be controlled by the DBT. This is because a fault will be raised when uninstrumented code attempts to access a tainted pointer. Such faults are easy to recover from.

    In terms of efficiency, it’s not amazing, as the instrumentation is heavyweight. I’m eagerly looking forward to getting my hands on a machine with Intel MPX. The reported performance numbers were also for an older version of the framework.

  2. Is there an interop story for deputy? Can you apply it piecemeal to a large codebase, so you don’t have to invest everything up front?

  3. Robby, yes, I should have pointed that out explicitly. Deputy has no representation changes at all since bounds are supplied by users. So incremental application of it to a large codebase comes for free. I believe this was one of the main motivations for Deputy since their previous system, CCured, had a whole-program analysis in the compiler.

  4. maybe the reason it didn’t catch on is the most depressing one possible: people are still not motivated (enough) to care about security.

  5. I started to comment that your experience sounds like what happens when people discover that static typing isn’t more restrictive than dynamic typing in practice. Then I saw that the paper was introducing a dependent type system to C.

  6. Robby, I guess the way to think about it is this: Even transparent memory safety has not taken off, so why would non-transparent memory safety take off?

    But the point about the annotations being awesome is not a widely appreciated one…

  7. Well, the transparent one has other costs besides programmer time, right? My feeling is that if we’re already serious about programming in C, we either have a serious interop problem to overcome, or we need the low-level performance/raw memory access. In those cases (IIUC?), Deputy is awesome. It adds security without getting in the way of those problems. Other options don’t seem to have that same property?

  8. “The current status is that Deputy has not been supported for some time…”

    I believe this is due to a phenomenon you’ve also blogged about. The last update of Deputy was in 2007, and guess what else the main developer completed right around then?

  9. I think one reason annotations has not caught on is that they have been tried in C++ and Java. C++ introduced attributes partially with the intention of serving as annotations, but then holted progress on them, only allowing a limited set of attributes into the language. Java introduced annotations in 1.5, and they have only seen limited use. Java 8 has several significant new annotations (such as @nonnull) which may boost their popularity.

  10. Should `array.length` in COUNT annotation be instead a variable like `arrayLength` ?

  11. @Mark N: but isn’t the research prototype good enough for someone to give it a spin and decide if they wanted to put more resources into it?

  12. Related is the SAL (static annotation language?) used at Microsoft to make Prefast checking more effective. MS made a big push to annotate all of their header files so that various checks could be done more effectively in a modular fashion. AFAIK, this is quite heavily used within the company…