Memory Safe C/C++: Time to Flip the Switch


For a number of years I’ve been asking:

If the cost of memory safety bugs in C/C++ codes is significant, and if solutions are available, why aren’t we using them in production systems?

Here’s a previous blog post on the subject and a quick summary of the possible answers to my question:

  • The cost of enforcement-related slowdowns is greater than the cost of vulnerabilities.
  • The cost due to slowdown is not greater than the cost of vulnerabilities, but people act like it is because the performance costs are up-front whereas security costs are down the road.
  • Memory safety tools are not ready for prime time for other reasons, like maybe they crash a lot or raise false alarms.
  • Plain old inertia: unsafety was good enough 40 years ago and it’s good enough now.

I’m returning to this topic for two reasons. First, there’s a new paper SoK: Eternal War in Memory that provides a useful survey and analysis of current methods for avoiding memory safety bugs in legacy C/C++ code. (I’m probably being dense but can someone explain what “SoK” in the title refers to? In any case I like the core war allusion.)

When I say “memory safety” I’m referring to relatively comprehensive strategies for trapping the subset of undefined behaviors in C/C++ that are violations of the memory model and that frequently lead to RAM corruption (I say “relatively comprehensive” since even the strongest enforcement has holes, for example due to inline assembly or libraries that can’t be recompiled). The paper, on the other hand, is about a broader collection of solutions to memory safety problems including weak ones like ASLR, stack canaries, and NX bits that catch small but useful subsets of memory safety errors with very low overhead.

The SoK paper does two things. First, it analyzes the different pathways that begin with an untrapped undefined behavior and end with an exploit. This analysis is useful because it helps us understand the situations in which each kind of protection is helpful. Second, the paper evaluates a collection of modern protection schemes along the following axes:

  • protection: what policy is enforced, and how effective is it at stopping memory-based attacks?
  • cost: what is the resource cost in terms of slowdown and memory usage?
  • compatibility: does the source code need to be changed? does it need to be recompiled? can protected and unprotected code interact freely?

As we might expect, stronger protection generally entails higher overhead and more severe compatibility problems.

The second reason for this post is that I’ve reached the conclusion that 30 years of research on memory safe C/C++ should be enough. It’s time to suck it up, take the best available memory safety solution, and just turn it on by default for a major open-source OS distribution such as Ubuntu. For those of us whose single-user machines are quad-core with 16 GB of RAM, the added resource usage is not going to make a difference. I promise to be an early adopter. People running servers might want to turn off safety for the more performance-critical parts of their workloads (though of course these might be where safety is most important). Netbook and Raspberry Pi users probably need to opt out of safety for now.

If the safe-by-default experiment succeeded, we would have (for the first time) a substantial user base for memory-safe C/C++. There would then be an excellent secondary payoff in research aimed at reducing the cost of safety, increasing the strength of the safety guarantees, and dealing with safety exceptions in interesting ways. My guess is that progress would be rapid. If the experiment failed, the new OS would fail to gain users and the vendor would have to back off to the unsafe baseline.

Please nobody leave a comment suggesting that it would be better to just stop using C/C++ instead of making them safe.


80 responses to “Memory Safe C/C++: Time to Flip the Switch”

  1. The “Eternal War in Memory” paper says “pointers can legitimately go out of bounds as long as they are not dereferenced”. I thought that was undefined behavior.

  2. Word “unsafe” covers different situations. I work with large images. I need pointer arithmetic. Once one wise boy tried to modify my code a little. He excluded three lines of inline assembler. As a result he slowed execution of a corresponding part of code from 20 sec to 12 minutes! Assume, I trace a contour line in an image. Doing so, I can leave this image and start to walk on memory, which contents different data. It is one sort of “safety” issues. There is another one for example: a programmer allocated 16 bytes buffer in a stack for a type of protocol (http, https, ftp etc.). Bad guy typed string like “ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ”. This is another sort of safety issue. But there is a simple way to prevent such sort of attack: a simple class with safe buffers. As explained Kris Kaspersky more than 10 years ago, this problem can be fixed, using SEH. Three (or more) sequential pages with different access. The first and the last must have attribute PAGEGUARD or NOACCESS, all the others READWRITE. These pages inside the buffer accept user’s input. Such sort of control gives an absolute guarantee, that input buffer won’t be overrun.

  3. Hi Jesse, here’s the actual text:

    “One problem with this approach, however, is that pointers
    can legitimately go out of bounds as long as they are not
    dereferenced. For instance, during the last iteration of a loop
    over an array, a pointer typically goes off the array by one,
    but it is not dereferenced”

    As I’m sure you know, it is OK to create (but not dereference) a pointer to the location one past the end of an array, but it is illegal to create a pointer before the start of an array or more than one element past its end.

    So the first sentence of this quote makes it look like the authors don’t quite understand the situation, but then the second one makes it seem like perhaps they do?

  4. Hi Andrew, right — there are many kinds of safety that can be enforced. The “Eternal War” paper contains a good discussion of the tradeoffs between them.

  5. If Intel really does produce hardware support (thanks for the hope of some good news there, Milo!) I think that would “flip the switch” at least eventually. Security gains would be huge, but I think an under-appreciated gain would be the huge amount of time programmers and testers (including, yes, “good programmers”) would save because bugs that are currently insidious and intermittent, and very hard to find in testing, and often hard to debug, would become much easier to find and fix. Programmer productivity and testing effectiveness are not penny-ante things. Being able to run automated tests “under Valgrind” but without the huge decrease in throughput would be a big win for people like me, but even normal testing and development would benefit hugely, just in terms of the bugs we _already find_.

  6. Just learn to do a little decent design and it’s really not that difficult to know who own the memory and when it can/should be deallocated.

  7. The great thing about the comments on the post is the are a microcosm of all the sorts of comments I’ve received while working on this problem over the years. In addition to supportive comments, I’ve heard everything from “bah, give up on C/C++, they are too broken” to “bah, it isn’t that hard to write correct C/C++” to “bah, anything more than 1% overhead would be totally unacceptable” to “bah, other mitigations other than memory safety are good enough”.

    Working on making C/C++ safe and efficient is in many ways a pragmatic middle ground—yet also pie in the sky, as everyone knows it isn’t possible—but as we know from today’s polarized political environment, such pragmatic centrists often find themselves in some uncomfortable spots.

  8. instead of changing the language why not choose a different tool from your tool box. mechanics choose the appropriate tools to do their work. if the tool you are using has some kind of downside — then find another tool to use. the concern you are trying to address may not be a concern to others, but the solution you are proposing will affect those who are not as concerned as you.

  9. Hi Milo, all true. But you haven’t even seen the handful of comments that I’ve had to delete, ugh.

    An odd fact about blogging is that I could write the most ridiculous crap and not hear a peep about it, but a post like this which is basically just common sense gets a bunch of flak.

  10. it was a very interesting discussion!
    so long and thanks for all the fish :0)

  11. Agreed that Address randomization is a good trick.
    But just try to use bounds checked stl from microsoft once and you will never again if what you are doing requires any kind of performance. Just not realistic.
    Same with check stl in general. not even feasible as a debug method, just impossible to ever get anything done.
    pointers. heard the same arguments against them for 20 years. but they stay.

  12. One very simple sample: I have two images, A and B. Assume, both of them are of the same size and 32 bpp and pA points to some pixel of image A. Pointer, which points to corresponding pixel of image B pB=pA+pFirstPixelOfB-pFirstPixelOfA. Such sort of pointer arithmetic will be forbidden in any “safe” language. A qualified programmer can make safe programs, using C (C++). Bad news is that writing such progs requires more efforts and more qualification. Employers dream about time, when sophisticate soft can be written by cheap and non- qualified scholars. This is why simple languages are so popular.

  13. The mainframe people solved this issue in the 1950’s and the early 1960’s. My comments refer to Burroughs Corp. (later Unisys Corp.).

    Each allocated page should have a “tag” that defines whether it is code, data, or something else. The page descriptor defines the start of the page, its length, and acceptable execution modes.

    “Unsafe” code can modify this tag, but that is only available with elevated privileges. User code, and otherwise normal code, is restricted to executing code, accessing data, and doing both only in the accepted limits.

  14. Dear Michael, what makes you think so? VirtualProtect can be called by any prog, running with user’s privileges (ring 3). If someone call this function, it doesn’t make his code “unsafe”. He may increase security level, as I explained before. Usually word “unsafe” means pointer arithmetic, when after simple arithmetic ops a pointer might point to some forbidden region in address space.

  15. It would be interesting to know what fraction of memory-safety bugs in C++ code could be prevented by (for example) suitable programming guidelines. I’m thinking here of things like always
    using std::vector::at() (which does a run-time check that an array reference is in-bounds) in preference to std::vector::operator[] (which is typically implemented as the same as the C [] subscripts, i.e., no checking).

    Inspired by a colleague, I adopted this into my personal C++ programming guidelines about 5 years ago. It’s caught a lot of bugs since then…

    It would be interesting to classify all the security bugs in some large OS bug database by programming language (C, C++, …), and for the C++ ones try to determine something about the level of abstraction being used, e.g., was the programmer manipulating raw arrays vs was she manipulating STL containers?

    Another idea… turn on bounds-checking STL in some large C++ codebases (e.g., firefox). I would happily live with a firefox that was a factor of 2 slower than it’s already bloated state if it had a substantial security improvement…..

  16. Clang already has a ‘-fsanitize=bounds’ flag, which performs instrumentation for buffer overflow detection (intra-procedural only). The overhead is usually just a few %.
    There’s still some work to be done on the optimization side, but it’s a start, I guess. Well, and extend the instrumentation to the inter-procedural setting, as well.

  17. Followup…

    What would be really interesting to know is, what fraction of (say) the last N year’s firefox security bugs could have been prevented by such programming guidelines and/or bounds-checking STL and other containers? If that number is “50%” then we have something very interesting.
    If it’s 1% (and they’re not “the nastiest 1%”) then we have something a lot less interesting…

  18. @Nuno #68:

    It’s perhaps also worth noting that on OpenBSD,
    /usr/bin/gcc comes with the ProPolice stack-protection extension turned on by default (and this has been the case since sometime around 2005). This is the compiler used to build the kernal and almost all of the userland.

  19. A final note… ProPolice (which catches a large fraction of stack-smashing buffer overruns) has a very low overhead — around 2% to 3% CPU time and less than that for code size.

  20. @Jonathan, preventing stack smashing is great, and it is good to see OpenBSD willing to give up a bit in performance for more security. However, attackers have adapted and the sort of memory corruption vulnerabilities being exploited in the wild have moved far past simple stack smashing.

  21. Jonathan, a fascinating piece of followup work to the “eternal war” paper would be a large-scale study of vulnerabilities from (say) the last 24 months with an analysis of what technologies (if any) would have rendered each bug unexploitable, had they been deployed.

    As Milo indicates, we keep raising the bar for security but attackers show remarkable adaptability. And of course even if we turned on memory safety for an entire platform, attackers would simply shift all of their efforts towards non-memory bugs.

    The hypothesis behind memory safety research, which we maybe have not yet managed to state very clearly, is something like: The costs of pervasive memory safety for C/C++ are worth paying because (1) safety will stop entire classes of attacks once and for all, as opposed to just raising the barrier to entry, and (2) developer productivity will increase due to easier, earlier detection of safety errors.

    Milo’s guess, which I’m liking more and more, is that perhaps this hypothesis is false for software-only safety and true for hardware-assisted safety.

  22. First of all let me say that those complain about C, C++ vs Java,C# seem to miss the point that there are native code compilers for such languages and even research OS done in them.

    Nowadays when targeting Windows Phone 8, C# is actually compiled to native code and the new compiler (Roslyn) is done in C#, not C++.

    Around 30 years ago we had the possibility to get safer systems programming languages with Modula-2 and Ada, sadly the industry, for various reasons choose the C route with the price we pay nowadays in security.

    The only way to make C memory safe, without having tons of tools that offer patches over patches in terms of safety, is to have some kind of Safe C, where the usual errors are disallowed except in unsafe blocks or similar.

    Namely:

    – automatic decay of vectors into pointers when passed as parameters;

    – lack of bounds checking (make bounds checking selective, like in stronger type systems languages)

    – offer real arrays, instead of relying on the developer to specify the size, opening the door to copy paste errors

    Who knows, maybe we just need more security exploits until someone really puts the breaks on.

  23. 64/Andrew,

    All sorts of different things could happen with
    pB=pA+pFirstPixelOfB-pFirstPixelOfA if pointers are segmented. For example if you break the expression up into tmp = (pA + pFirstPixelOfB) and pB = tmp – pFirstPixelOfA, the first part (the addition) could add the complete values of both pointers, but the second part (the subtraction) could assume that tmp is a pointer into A, and might only subtract the non-segment part of pFirstPixelOfA from tmp.

    I think
    pB = pFirstPixelOfB + (pA – pFirstPixelOfA)
    is easier to understand and should also be correct as far as any compiler with any pointer model is concerned (the bit in brackets is valid and returns a ptrdiff_t, which can be added validly to pFirstPixelOfB).

  24. pB = pFirstPixelOfB + (pA – pFirstPixelOfA) will be correct in C++, but as I said such sort of pointer arithmetic is forbidden is so-called “safe” languages. There are another problems. Assume, I declared 2D array A[3][3]. What about b=A[4][0]? Or A[4][0]=0? In fact it is data corruption. If a language controls access to data, it must prevent such set of indexes. On the other hand, it means, that I cannot treat 2D array as 1D one and vise versa. I have a feeling, that “safe” languages bind my hands.

  25. Until Jonathan chimed in, no one mentioned STL, which is part of the C++ standard. If you standardize on use of STL containers (note that not all programming problems are amenable to this), then you can turn potentially exploitable dereferences into C++ exceptions, which are not generally exploitable. As to perf, if you take the time to actually do perf tuning, you start figuring out how to use STL in a performant way, and for some pieces of code, you actually get it to go faster. For example, std::vector is generally within a few percent of the best implementations of a resizable buffer, and several times faster than the more common bad ones. Correctly used STL does not have to be a performance hit. On the way to converting the code, you make things exception safe, which generally implies that everything is initialized and cleaned up correctly, and the perf investigation will often reveal places the code may not have a good design, STL or not.

    While this observation isn’t especially exciting from an academic standpoint, from an engineering standpoint, the advice to go actually learn the more advanced parts of the standard and use it to make all sorts of reliability problems go away is solid.

    To those of you who might complain “We don’t use exceptions in our code”, my response is to ask whether you ever dereference any pointers. If you do, you’re programming with exceptions, just not nice, well-controlled exceptions.

  26. David, I agree– a safe-by-default STL isn’t too sexy but is probably a very good idea in the short run. Unfortunately this does not fix the legacy C and non-STL C++ code.

  27. David and regehr, I agree with both.

    C++ can be use in a safe way, if the developers restrict themselves to Modern C++ with STL, alongside -Wall -Werror and static analysis.

    The main problem is that too many developers write C like code in C++, and this also does not solve the problem for pure C programs.

  28. Hi! I’m the main author of the mentioned SoK paper. Thanks a lot for referring to it! I know, I got here a bit late (although I also follow the blog), but I was really happy to see this long conversation. After all, this was the primary goal of the paper, to re-initiate discussion about the topic. But to reflect to the original post or proposal, I’m not sure we could wrap it up yet.

    The overhead of pointer based solutions (like SoftBounds+CETS) is too high (2-4x). Hardware support would of course fundamentally change the situation, but I don’t think we should just wait for Intel (and AMD and ARM) to implement it, since we can only speculate about it.

    Object based protections (like ASAN or BBC) don’t provide full protection, and their overhead is still a bit high (2x).

    Projects like SafeCode could have significantly lower overhead, but like other solutions based on static pointer analysis (such as CFI/DFI/WIT/DSR), it has serious compatibility/modularity issues (dynamic libraries).

    So as the SoK paper suggested as well, I don’t think there is such thing as the “best available memory safety solution”. I think more research is needed, or to quote myself: The war is not over. 🙂

    I’m looking forward to reading newer posts on the topic and thanks again for mentioning our paper!