Paths to External Engagement in Computer Science Research

The other day I wrote a post imploring academic computer scientists to at least occasionally break out of their research bubbles and engage with real engineering problems where their research matters. That post was, admittedly, a bit facile about how one might make this engagement happen. This piece suggests some ways. I don’t claim any particular authority or expertise, though I have — except where noted — tried out all of the suggestions below. Perhaps others will chime in with what has worked for them.

Talk to a lot of people and create a network. Networking is a really important skill. Back in the day Phil Agre’s Networking on the Network was what students would read. It predates modern social networks but contains a lot of timeless advice and in any case I don’t know of a better choice.

Attend conferences outside the academic mainstream. In some cases, such as SIGGRAPH and Supercomputing, there’s plenty of non-academic content at the conference you’re attending anyhow, but most of us aren’t so lucky. There’s a really wide range of industrial conferences; some of them definitely won’t be very interesting for academics so you need to do some homework ahead of time to find the one where the right people will be. For my current work, the LLVM Developers Meeting is the main event, almost all of the community’s heavy hitters are there and the technical sessions are amazing. The security community has plenty of great non-academic events. I’ve heard good things about !!Con and Strange Loop. In any case, the point of attending these conferences (besides the technical content) is to meet people who aren’t professors or students.

Spend a day visiting a company or a government agency. You need an invitation, but you can invite yourself if you can make a contact for example via your advisor, a mutual friend, or by meeting people at conferences. Talk to people there, get a sense of what they’re doing with their days, what they’re worried about, what they’re frustrated with. Give a talk about your work if possible, but this isn’t necessary. It often makes sense to do these visits when you’re already traveling.

Spend longer visiting a company or government agency. Depending on your career stage this could be an internship, a postdoc, a sabbatical, a summer, or a leave of absence. This is a chance to work closely with people for an extended period of time. A lot of people do this at some point in their careers and I think it’s a really good idea.

Engage on twitter. It’s a weird, crowded, noisy place and it can take a while to find the right people to follow and longer to get them to follow you. The advantage of Twitter is that there’s a huge number of smart people are out there and communicating with them is almost frictionless.

Blog. People are far more likely to read a blog entry than a paper, in my experience. Also, the readership is different, because non-academics are even less likely to read a paper than academics are. Realistically, starting a blog only makes sense if you have a fairly consistent stream of things to say that don’t fit into tweets and don’t really belong in academic papers. Building an audience takes time and requires a certain amount of regularity in writing; these don’t necessarily fit in very well with the academic binge model of working that many of us subscribe to. Another issue is that blogging doesn’t pay the bills, academically speaking — you should only do it because you want to, not because you expect any direct benefit to your career. I waited until getting tenure to start a blog for this reason, and also to make sure that I had at least a few years’ worth of ideas lined up.

Find people who are great at external engagement. Emulate them, collaborate with them, or join them. The Racket folks are amazing at this.

Release software. Put your stuff on Github, polish it up, and then tell people about it. Get users, accept pull requests, respond to feedback, fix bugs, add features, cut releases, and repeat. Either your code will provide people with a good value proposition or it won’t — either way you learn something. The caveats are that building a user base takes time, creating realistically usable software is like 25 times as much work as creating research-grade crapware, and only a small subset of computer science professors will value your contributions in this area. But it is enormously fun and anyway you don’t want to make the mistake of caring too much what professors think.

Engage with existing open source software. For many of us, there’s an obvious open source project that we could be contributing to or otherwise helping out. Find that project and read their mailing lists, look into the issue tracker, build and use the code, read the code, and maybe submit a patch or two. Beyond these things, consider attending their meetings or BoF sessions, if these exist. A reasonable long-term goal would be to use your work to make a significant improvement to the open source program.

Start a company. This one I haven’t done, though I know many people who have. It is a somewhat extreme option, as much a lifestyle choice as research engagement strategy. Details are out of scope of this post and anyway I don’t know anything about doing this.

Ok, with all that said, I’ll make a prediction or two about what will happen if you follow these suggestions. First, you’ll often find it frustrating and some of the time you invest will be wasted. I’ve burned months on things that never bore the tiniest fruit; if I knew how to tell you to avoid this, I certainly would. Second, you’ll discover that the problems that people are having out there aren’t the ones that you would have predicted, nor are they the ones that your CS friends and colleagues predicted. You need to learn to listen to people, but often even the people having problems aren’t actually having the problems that they think they’re having (anyone who has worked tech support will tell you this is the case more often than not). You need to learn to observe carefully and read between the lines to figure out what’s really going on. Third, at some point you will run into the distinction between problem-driven research and solution-driven research. The former is like trying to cure cancer or put a person on Mars: the problem is everything and you’ll consider any solution that might work. The latter is where your research hammer is a damn good one and you’re never going to put it down: if it can’t solve someone’s problem, you’ll move on and find a different problem. Obviously there’s a spectrum — but you’ll need to decide where on it you sit.

Closing the Loop: The Importance of External Engagement in Computer Science Research

Computer scientists tend to work by separating the essence of a problem from its environment, solving it in an abstract form, and then figuring out how to make the abstract solution work in the real world. For example, there is an enormous body of work on solving searching and sorting problems and in general it applies to finding and rearranging things regardless of whether they live in memory, on disk, or in a filing cabinet.

To paint a bit of a caricature, we have the real world where:

  • all problems are engineering problems, and every engineering problem is distinct from the rest in small or large ways
  • abstractions are leaky: machines fail, bits flip, packets drop, and bugs exist
  • requirements are many-dimensional, diverse, and poorly specified
  • systems often involve human beings who are notoriously hard to reason about

Overall, engineering problems are messy and we usually can’t prove anything about anything, any more than we could prove the correctness and optimality of a bridge or lawnmower.

On the other hand, in the abstract problem space, we’ve dropped every detail that we don’t wish to reason about but retained (ideally) all of the important characteristics of the problem. We’re now dealing with a model — sometimes a formal mathematical model, other times an executable model such as a kernel or simulation — of part or all of the problem. Models can be rigorously analyzed for efficiency, optimality, correctness, and whatever else. Furthermore, connections between problems that initially seem very different may become more apparent in the abstract form.

This process of lifting engineering challenges into abstract problems, solving them, and applying the results — I’ll call it the computer science research loop — is so integral to the DNA of computer science research that it is simply assumed; people have a hard time imagining any other way to work. Also, it has been incredibly successful, which is unsurprising since we inherited it from mathematics where it had been giving good results ever since some prehistoric person observed that you could talk about the number three without actually having three things in front of you.

Here’s the loop in its simplest form:

Here are some ways the research loop shows up in areas that I’m familiar with:

  • In compilers we can develop a toy language that is much easier to compile but that (we argue) retains the essential features of real programming languages.
  • In compilers we can compile a benchmark suite instead of real applications, arguing that our results will translate over into practice.
  • In resource scheduling research it is typical to abstract away all details of the jobs being scheduled.
  • In databases or operating systems we can create a transaction engine or OS kernel that supports only a tiny subset of the features provided by SQL Server or Linux, arguing that the advantages displayed by our simplified model would not disappear if we took the trouble to implement all the boring stuff.

In all cases the goal is to elide details that make our work harder, but without oversimplifying. This piece is about an avoidable but undesirable second-order effect: it is common for both edges of the computer science research loop to be weaker than they could be.

The concrete-to-abstract edge suffers when people working on the abstract side don’t have deep expertise in the concrete problems they’re trying to solve, and it also tends to weaken over time as the character of the problem drifts, causing assumptions on the abstract side to be invalidated. The abstract side has a kind of inertia: infrastructure builds up in code, formalisms, books and papers, and mental models. It requires significant time, energy, and risk-taking to throw away parts of the abstract infrastructure and create fresh pieces. Abstractions that are out of touch with real-world problems can linger, producing papers and PhD theses, for decades.

The abstract-to-concrete edge of the research loop is also problematic: solving real engineering problems, or interacting with the people whose jobs are to solve those problems, can be difficult and time-consuming. It is generally much easier to work purely on the abstract side, and in fact our field’s mathematical roots encourage this behavior. Abstract work is, of course, fine as long as someone else is doing the grungy part, but in many cases that never happens because the abstract side has drifted off to the side of the real problems, becoming more elaborate and complex over time as the easy problems get mined out, and in the end there’s no realistic prospect of applying it.

I believe these issues cause computer science research, overall, to be less interesting and impactful than it could be. I also believe that mitigating the problem isn’t that difficult and that doing so tends to make researchers’ careers a lot more fun and rewarding.

The solution is for researchers to engage with the world outside of their research bubble. Working on real-time scheduling? Then go find some drone software and help its authors or users avoid deadline misses, or else contribute scheduling code to Linux. Working on concurrency control in databases? Figure out a way to integrate the new scheme into MySQL or something, instead of waiting for them to read your paper. Working on finding bugs in software? Report the bugs to the people who maintain the code and watch their reactions. It is particularly important that students do these things, first because their intuitions often aren’t as well-developed and second because I’ve noticed that quite a few CS graduate students are quietly and privately wondering if their work is good for anything in the end. It turns out there’s a way to answer this question: engage with the people whose problems you are solving. As a bonus you’ll publish fewer papers.

It is not the case that that every piece of research should be applied research. Rather, good pure research usually stems from direct personal experience with problems on the engineering side of our world. It’s a bit of a subtle point: doing the grungy boots-on-ground work is how we build our intuitions about what kinds of solutions actually work vs. sounding good on paper. It is hard — though not impossible — to skip this step and still do great work.

Going a bit further, my experience is that much of the interesting action in research happens on the abstract-to-concrete edge of the CS research loop, even though this work is not glamorous or well-rewarded by program committees or grant panels. Even the old workhorses like sorting an array or implementing a key-value map became dramatically more interesting and complex in the context of a real machine, operating system, compiler, and workload.

Concretely, here are some things to look for that might indicate that a research community needs to tighten up its loop:

  • few members of the community are plugged into the concrete problem domain, and are providing fresh insights from developments there
  • few members of the community are moving abstract results into practice
  • members of the community are mainly interested in impressing each other (or, equivalently, papers that demonstrate external impact are not highly valued)
  • the community rewards complex solutions because they are innately interesting, as opposed to rewarding simple solutions because they have engineering merit
  • years of results cluster around the same baseline or benchmark set, instead of continually moving the bar higher

In conclusion, the tension between mathematics and engineering is alive and well in our field. My belief is that more of us, perhaps even most of us, should be skilled in, and actively working in, both modes of thinking.

Also see this followup post.

The Basic Toolbox

This post is aimed at computer science students.

In the software engineering course I’m teaching this spring, I often find myself saying things like “you need to know a scripting language” or “everyone should be able to run a code coverage tool.” Finally, the other day, a student stopped me and asked for the whole list. In other words, what — in my opinion — is the collection of tools that someone graduating with a CS degree should know how to use. Of course I couldn’t answer this on the spot but I’ve been thinking about it since then. The basic idea is that for most any common situation, you should have a decent tool at hand and be able to start solving problems with it without too much fumbling around. (Keep in mind that this is a wish list for self-study: I doubt that any CS program teaches all of these. Also, I didn’t have all of these tool skills when I got my undergraduate CS degree, though I did by the time I got a PhD.)

A version control system: Git is the obvious choice; the main thing you should have is a basic Github-centric workflow including pull requests, remotes, dealing with merge conflicts, etc.

A text editor: We all end up using different editors from time to time, but we should each have a solid default choice that does a good job with most editing tasks. It should highlight and indent any common programming language, integrate with a spellchecker, easily load gigantic files, have nice regex-based search and replace, etc. There are plenty of choices, many CS people migrate to vim or emacs.

A graphing program: I routinely use gnuplot, graphviz, and Powerpoint to make figures. Lots of people like matplotlib.

A presentation tool: Powerpoint, Keynote, Google Slides, something LaTeX based, etc.

An interactive debugger for native executables: LLDB, GDB, something IDE-based.

A generic build system: Make, CMake, etc.

A scripting language: This is for low-grade automation, quick and dirty data analysis tasks, etc. Python and JavaScript would seem like natural choices. Around 20 years ago I was an intern at a networking company and my supervisor popped out of a meeting with some data concerning switch errors, and asked me to do some analysis to locate the underlying pattern. I wasn’t sure how; he handed me a Perl book and I was able to get the job done before the meeting ended.

A shell language: This is probably bash or PowerShell, but there are plenty of other choices. There’s some overlap with scripting languages but I think there are two distinct niches here: a shell language is for scripting a smallish number of commands, doing a bit of error checking, and perhaps looping or interacting with the user slightly This sort of job is a bit too cumbersome in Python, Perl, or JavaScript.

A systems language: This is for creating servers, daemons, and other code that wants to go fast, use little memory, have few dependencies, and interact tightly with the OS. C or C++ would be the obvious choices, but Rust and Go may be fine too.

A workhorse language: This is your default programming language for most tasks, it should have a huge collection of high-quality libraries, be pretty fast, run on all common platforms, have a great tool ecosystem, etc. Racket, Java, Scala, OCaml, C#, Swift, or Haskell would be great — even C++ would work.

A pocket calculator: This is your go-to REPL for basic arithmetic and conversions between number representations, it should be near-instantaneous to get answers. For reasons I no longer remember, I use gdb for this — typically multiple times in any work day. Old standbys like bc and dc also seem like bad choices. I’m curious what other people do here.

Tools for Programming Languages

There’s no reason these days to use a language that doesn’t have a good tool ecosystem. For any given language you should know how to use its interactive debugger, static and dynamic bug-finding tools, a profiler, a code coverage tool, a build system, a package manager, and perhaps a random test-case generator.

Secondary Tools

There are a lot of other tools that could have gone into my basic toolbox, such as a data analysis tool, a browser language, a cloud-based testing service, a statistics language, a typesetting system, a spreadsheet, a database, and a GUI builder/toolkit. I don’t consider these as fundamental; of course, your mileage may vary.

Trust Boundaries in Software Systems

One of the big things that has changed in computer science education over the last 20 years is that it is now mandatory to prepare students for writing software that lives in a hostile environment. This content can’t be limited to a computer security course, it has to be spread throughout the curriculum. My experience, based on talking to people, looking through textbooks, and looking at lecture material on the web, is that overall we’re not yet doing a great job at this.

When teaching this subject, I’ve started using trust boundaries as an organizing principle. A trust boundary exists any time we (the system designers or system owners) trust code, data, or human actors on one side of an interface more than we trust the other side of the interface. Students need to be able to recognize, understand, fortify, and stress-test the trust boundaries in any system they have a stake in.

Trust boundaries aren’t hard to find: We just need to ask questions like “What are the consequences if this code/data became horribly malicious? Is that likely? Can we defend against it? Do we want to defend against it?” It is easy to conclude, for example, that a demonic garbage collector or OS kernel might not be something that we wish to defend against, but that we had better fortify our systems against toxic PNG files that we load from random web sites.

Some basic observations about trust boundaries:

  1. They’re everywhere, even inside code written by a single person. Anytime I put an assertion into my code, it’s a tacit acknowledgment that I don’t have complete trust that the property being asserted actually holds.
  2. The seriousness of trust boundaries varies greatly, from mild mistrust within a software library all the way to major safety issues where a power plant connects to the internet.
  3. They change over time: a lot of our security woes stem from trust boundaries becoming more serious than they had been in the past. Email was not designed for security. The NSA wasn’t ready for Snowden. Embedded control systems weren’t intended to be networked. Libraries for decoding images, movies, and other compressed file formats that were developed in the 90s were not ready for the kinds of creative exploits that they faced later on.
  4. If you fail to recognize and properly fortify an important trust boundary, it is very likely that someone else will recognize it and then exploit it.

To deal with trust boundaries, we have all the usual techniques and organizing principles: input sanitization, defense in depth, sandboxing, secure authentication, least privilege, etc. The issue that I’m trying to respond to with this post is that, in my experience, it doesn’t really work to hand students these tools without some sort of framework they can use to help figure out where and when to deploy the different defenses. I’d be interested to hear how other CS instructors are dealing with these issues.

Stories Behind Papers: Integer Overflow

A couple months ago Jean Yang and Vijay Chidambaram had a Twitter discussion about the stories behind research efforts that you might hear over coffee, but that usually don’t get written up. Vijay started a series of posts containing these. I thought I’d write up a couple of them myself. Alas, none will be particularly dramatic. This seems like a good one to start with.

Around the mid/late 2000s — perhaps starting with Nearly All Binary Searches and Mergesorts are Broken — I got interested in integer overflow bugs. At this point the security aspect of integer bugs in C and C++ was receiving plenty of attention, but I didn’t feel like very many people were looking at the broader issue of logic errors stemming from integer overflows. Even in functional languages with super-serious type systems and a focus on correctness, integer overflow was (and is) an often-neglected issue. These problems are fundamentally difficult to deal with at compile time.

Anyhow, the thing that really got me motivated was the very limited understanding of undefined behavior that seemed to be par for the course in those days. Additionally, most of the existing research tools for detecting or mitigating integer overflows were operating on compiled code, while I believed this problem needed to be attacked near the source level.

By summer 2010 my student Peng Li (now at Baidu USA) had a modified version of Clang that emitted dynamic checks for integer overflow, divide by zero, value-losing typecasts, shifts past bitwidth, and that kind of thing into a compiled C or C++ program. We used this to test open source software and it turned out that basically all programs were executing a constant stream of undefined behavior. I fired off a few dozen bug reports. Since UB wasn’t widely understood at that time, many developers still had the attitude “that is OK since we did it intentionally” or “I am allowed to expect signed overflow in C/C++ to have two’s complement behavior because that is what the hardware does.” See for example the discussions that happened at PostgreSQL and PHP.

In early 2011 I visited Grigore Rosu’s group at UIUC to learn about their awesome new KCC tool. We needed something like this to filter out undefined programs that C-Reduce was creating while making minimal versions of bug-triggering programs from Csmith. During this visit I happened to be able to grab a few minutes with Vikram Adve and learned that he and his student Will Dietz were also working on LLVM-based integer overflow detection, and they also had a working prototype. Yikes! This wasn’t even close to the worst case scenario — which would have been learning about their work once they had a paper accepted somewhere — but it’s never fun to be scooped. Competition in research may occasionally be useful as a forcing function, but I am personally uninterested in competing. If smart people are working on a problem, I would like to leave them to it, they’ll most likely come up with a good solution. There are way too many other fun and important problems to work on for competing on any single problem to be attractive. Anyhow, luckily, Vikram and Will were happy to collaborate, so we started working together. I’m still happy with the resulting paper and I’m confident that it is better than what either of the groups would have produced working on its own.

One of our goals all along was to get integer overflow checks into Clang. This took a while longer and Will did most of the legwork. The job was made easier by the fact that by this time there was plenty of momentum towards dynamic undefined behavior detection in LLVM. For example, ASan was already part of the tree. There was an existing -fcatch-undefined-behavior flag that we fit into, but this pretty rapidly (in time for LLVM 3.3, I believe) got phased out in favor of the -fsanitize=undefined usage that Clang still uses.

Overall, dynamic detection of integer-related undefined behaviors in C/C++ is not difficult, but convincing people to take these bugs seriously was a long struggle and the broader issue of how integer overflows relate to program bugs is interesting and deep. Fundamentally, people usually do not want, and are not good at reasoning about, fixed-width integers, but on the other hand we don’t want to just put bignums everywhere in our low-level programming languages. The other thing I take away from this effort is how a lucky break and a willingness to collaborate were really important in making the work successful, and in fact my group continues to collaborate with Vikram’s.

A Conversation about Teaching Software Engineering

For better or worse, my impressions of software engineering as a field were shaped by a course I took as an undergrad that I thought was mostly not very interesting or useful. We spent a lot of time on waterfalls and stuff, while not covering testing in any detail. For the final project in the class we had to develop an application using a CASE tool (very hip at the time) where we described the class hierarchy using a GUI and then the tool generated skeletal C++ for us to fill in. Since we knew nothing about designing class hierarchies and also the tool was weird and buggy this all went about as disastrously as you would expect. In the end I learned quite a lot, but the lessons were probably not those intended by the instructor.

24 years later I’m teaching a software engineering class — this probably wouldn’t even happen if my department had any real software engineering faculty! Even so, I’m a true believer: I love the material and feel more strongly about its importance than I do about my more usual subjects like compilers and operating systems. I ignore software process and focus entirely on building skills and habits that I feel will come in handy in any software engineer’s career. If you like it, put a test on it. Read code. Review code. Refactor code. Write assertions. Adhere to coding standards. Design an API. Use a coverage tool, a bug-finding tool, a version control tool, a fuzz tool, a CI tool, to good effect. Repeat until end of semester. No doubt there’s room for improvement but the material seems solid.

Over Christmas break I had a beer with Daniel Dunbar, who I should have met long before, but somehow hadn’t. Daniel has done super impressive stuff: he was one of the original Klee authors and also was an early Clang implementer. I told him about my approach to teaching software engineering and picked his brain a bit about the sorts of things he wished people with CS degrees were better at doing. Of course I wasn’t taking notes and forgot most of it. So I mailed:

As I mentioned the other week, I’m teaching a software engineering course this semester, and rather than focusing on any kind of academic approach to this subject, I’ll try to teach them a lot of the real world business of making software that works, starting with all the basics like testing, coverage, assertions, and code reviews.

But you also mentioned some things that I agreed with but that I might not have thought of, or prioritized — the things that you wish people were already good at when you interview them, but they probably aren’t. Is there any chance I could get you to very briefly summarize those things, or point me to a resource you like on this subject? I want to make sure to cover, at least quickly, all of the main high points in this class.

Daniel graciously sent a long reply and also allowed me to reproduce it here:

To start with, I don’t think I know of any resources on the subject. It’s amazing how much obvious little stuff one has to know, but forgets about. Git is a huge source of “things I use every day but forgot I had to learn”, for better or worse.

I guess if I had to come up with a list off the cuff:

  1. The experience of maintaining software over time. I think we spend most of our time working with existing code bases and figuring out how to integrate changes into them. This area has a lot of related topics:
    • How do you figure out where to make a change? Tools here include debugging existing workflows to find where something happens, code search, git blame, git grep, git log -G, etc
    • How do you manage making incremental changes? I am a huge believer in always doing incremental work. How do you build a feature while always keeping the code working? Tools here include feature flags, adaptors and stub implementations, forwarding implementations, A/B testing before/after change.
    • How do you find the source of regressions? Tools here are basically bisection, git log -G.
    • How do you handle technical debt? What counts as technical debt? What kinds of debt are painful versus not?

    I feel like I probably have read things I liked on these topics, but none of the links are coming to mind now.

  2. The experience of making technical decisions. This is a *huge* part of development. Topics here:
    • How do you evaluate choices for a dependency? Tools here include benchmarking, analysis of the code, analysis of the software maintainability, etc
    • How do you evaluate when to adopt a dependency versus write your own? Topics here are NIH versus opportunity cost on innovation.
    • How do you convince people to follow a particular choice? Presenting or writing coherent write-ups on engineering tradeoffs is a really under appreciated skill which can have a big impact (the cost of bad decisions are high).
  3. How do you debug things? Another big part of development. I would emphasize debug here not just in the “how do I get this working” but even deeper in the “how do I understand what is really happening?”
    • I find that many people tend to give up at really understanding what is going on. “Oh XYZ didn’t work, so I did PDQ'”. The people who don’t give up usually end up understanding a lot more about computers, and then do a better job maturing over their career.
    • Maybe it would be good to teach people (a) don’t give up, and (b) here are all the tools you can use when you might want to give up. A lot of the time the tool people know is stackoverflow, but past that they are lost.
    • Things here include hardware watchpoints, reading the source code, disassembly and reverse engineering.
  4. How do you estimate the time to develop software? This is a huge part of a business, people will always want you to do this. Even just getting students to start to think about the process would be good, asking them to make estimates and compare results to them.
    • I have no advice on how to teach this because I still learning a lot here.
  5. How do you review code? What makes good review?
    • When is coding style important versus not? What are the pros/cons?
    • Does review catch bugs, or not? Are certain review styles more effective?
  6. How do people do release management? This is such an amazingly huge part of what we spend time on, and one that receives very little attention.
    • Do you release from trunk? If so, how do you ensure quality?
    • Do you have stable release branches? If so, how do you ensure bugs are actually fixed? Are people cherry picking fixes? What can go wrong? (I remember once cherry picking a fix that happened to merge incorrectly, but the patch applied an in a way that was still valid C++ code (an if {} clause ended up inside another one). The result was a clang that miscompiled itself.)
    • How do you deal with complicated merge conflicts?
    • People like Nicole Forsgren have research here which it would at least be nice for people to be exposed to.

If I had to come up with the kinds of potential exercises I would love it if someone was trying (no idea if they actually would work):

  1. Take a bug in a complex code base at some revision, and ask people to find it and fix it. Compare answers to the one the project actually adopted. A bug where there were several obvious fixes with tradeoffs would be a good point for discussion.
  2. Take a new feature which was added to some project, and study how it was done. Not to toot my own horn–its just an example I am familiar with–we migrated Clang to go from producing .s files to doing the assembly in memory. This involved lots of incremental refactoring, a clear switch over at one point (feature flag), A/B testing to compare old to new (i.e. we chose to shoot for binary equivalence of .o files, simply so we could easily test it — made for lots of extra engineering, but easier to guarantee correct). One could dig up how this went from a theoretical concept on a mailing list to an implemented feature.

    The best thing here would be to create a hypothetical project which is a mirror of a real project (but don’t make this clear at first) and ask people to design some extension of it. Then, compare the results to what the project actually decided to do, and the discussion around it. For example, analyze what things the project owners antagonized over that the students didn’t think of, and try and figure out why not.

  3. Do some systematic study of PRs as a “literature” exercise. How does tone impact response, how do people handle criticism, etc.
  4. Do an exercise where teams are forced to produce a project over the lifetime of the course. The exercises should be small and easier, but they have to be built into the same code base. Something that forces people to make software releases would be nice (dunno if there is a way to do this in such a way that people can choose whether or not to use “release branches”, but in a way that has tradeoffs in either direction).

Wow, this is a lot of material, way more than we cover in depth in a semester. Even so, I find it super useful as a high-level vision for the kinds of things students should end up at least having been exposed to.

Some Goals for High-impact Verified Compiler Research

I believe that translation validation, a branch of formal methods, is just about ready for widespread use. Translation validation means proving that a particular execution of a compiler did the right thing, as opposed to proving once and for all that every execution of a compiler will do the right thing. These are very different. Consider some obstacles to once-and-for-all verification of a tool like GCC or LLVM where:

  • Most of execution is spent processing pointer soup where correctness depends on poorly documented and incredibly detailed properties of the soup.
  • Hundreds or thousands of analyses and transformations are performed, and due to performance constraints the compiler implementation entangles them in a way that is nearly impossible to disentangle.
  • The implementation language is usually unsafe, forcing any formal verification effort to spend an outrageous amount of effort proving properties of the compiler code, such as memory safety, that are incidental to the task of interest. A convincing formal verifier for the subset of C++ that LLVM is written in doesn’t even exist.
  • For some compiler algorithms like register allocation, it appears to be fundamentally easier to check the result than it is to prove that the right result is always computed. For example, CompCert uses this approach (or did the last time I looked).
  • The compiler is under active, rapid development. Any proof would have to be redone, likely incurring significant effort, for every release.

So it’s clear that once-and-for-all formal verification of LLVM or GCC is never going to happen, the costs ludicrously outweigh the benefits. Translation validation, on the other hand, is already to some extent practical, see for example this effort to prove that the seL4 object code refines its C source code. (Refinement just means that C gives a typical program many different meanings and we need to prove that the compiler has picked one of them.)

Other recent, LLVM-based work in this area includes Program Analysis for Compiler Validation (2008), Evaluating Value-Graph Translation Validation for LLVM (2011), Equality-Based Translation Validator for LLVM (2011), Formal Verification of SSA-Based Optimizations for LLVM (2013), An Extensible Verified Validator For Simple Optimizations in LLVM (2014), and Black-box equivalence checking across compiler optimizations (2017).

This work is awesome but research tools don’t, by themselves, stop people from being burned by compiler bugs. One way to make things better is to combine translation validation with aggressive testing, like we did here, and then make sure any resulting bugs get fixed. Better yet, we can try to push a translation validation out into the world so that anyone can use it. It’s time for this to happen. The rest of this piece is some thoughts about how that should work.

Goal 1: Ease of Use

The only thing an application developer should need to do is add a compiler flag like this:

clang++ -O -tv file.cpp

or:

rustc -O -tv file.rs

and then the compiler either validates, or fails to validate, its translation. It has to be this easy.

Goal 2: Near zero overhead for compiler developers

Translation validation can’t get in the way of normal development for a production compiler: it has to be almost entirely on the side. This doesn’t mean, however, that the compiler can’t help out the validator, but rather that this has to happen in non-invasive ways. For example, certain optimizations on nested loops that are hard to validate might need to emit a bit of extra debug info or optimization remarks or whatever, to help the validator piece together what happened.

Goal 3: Performance

Since translation validation will result in a lot of solver calls, it is going to be somewhat slow, probably well over an order of magnitude slower than regular compilation. A fairly easy way to speed it up would be to add a (persistent, networked) caching layer to exploit the fact that most parts of most code bases don’t change very often. We’ve had good luck using this kind of a cache for Souper, which is also slow due to making many solver calls.

Goal 4: Multiple Validators

Research tends to move rapidly when there is a level playing field and a clearly-defined goal, allowing different groups to compete or cooperate, as they see fit. Competition can be particularly motivating, see for example SMT-COMP.

The primary metric for choosing a winner in a translation validation competition is the number of functions validated for compilation of a given benchmark using a particular LLVM version and optimization level. Verification time would be a good secondary metric.

To ensure a fair competition, it would be best for all validators to be using the same semantics for the source and target languages. This isn’t so straightforward: all too often these mathematical artifacts end up not being readable or reusable since they are deeply embedded in the implementation of a formal methods tool (this is unfortunately the case for Alive, for example). A canonical, readable, writable, and reusable semantics for each of C, C++, Rust, Swift, LLVM IR, x86-64, etc. is something we should be spending significant resources on. This sort of thing is what I’m talking about.

Conclusions

Just to be clear, beyond the Alive-based work referenced above, I’m not working on, nor do I have any plans to work on, translation validation. Rather, it is clearly the right way to gain confidence that a production-grade compiler has done its job. The technologies are in reach and we should be working to deploy them widely.

The Real Problem with the US News Rankings

The latest list of Best Global Universities for Computer Science from US News has not been well received. For example, the Computing Research Association issued a statement saying that “Anyone with knowledge of CS research will see these rankings for what they are — nonsense — and ignore them. But others may be seriously misled.” The CRA statement identifies these problems with the US News rankings:

  • They ignore conference publications (many areas of CS publish primarily in conferences).
  • US News doesn’t even say which venues are used to compute the publication-based part of the ranking function.
  • The reputation-based part of the rankings doesn’t make much sense given the diverse, global nature of the computer science research community.

An additional problem is that it seems to be pretty easy to game this ranking system using money. For example, King Abdulaziz University (Jeddah, Saudi Arabia) has adopted hiring practices that appear to be designed to do this. Their CS department is ranked #13, compared for example to CMU at #22 and Illinois at #46. I’m trying to avoid being USA-centric and elitist here, but based on some web searches, it is just not possible to objectively rate the CS department at KAU as being better than the one at CMU. US News explains their ranking methodology here.

To summarize, US News is designed to make money, not to do the CS community any favors. Universities are going to try to maximize their rankings. It’s a pretty banal situation all around.

What I wanted to talk about today is the function of rankings. What are we supposed to do with them? The conclusion I’ve come to is that a closed, opaque ranking such as the one from US News is only good for one thing: codifying and reinforcing a pecking order so that it can be used by people who don’t need or want any more information than a total ordering. This might include, for example, university administrators who would like to know if sending additional resources to a department resulted in a measurable and externally-visible improvement.

The reason everyone’s annoyed with US News is that they’ve upended the established pecking order. But here’s the thing: they could fix this tomorrow and their opaque rankings would still be worthless for people who care about what’s behind the rankings, as opposed to being interested in ranking for its own sake. There has to be a better way.

In contrast, let’s take a look at CSRankings, a site that Emery Berger put together using publicly available data from DBLP. This ranking assigns credit to departments based on the number of top-tier papers published by their full-time faculty, credit for which is split among authors. There’s a FAQ giving additional details. (There’s a lot of quibbling that could be done about how this all works; I’m not too interested in that.)

The thing that makes this ranking different for practical purposes isn’t the openness of the algorithm and the data set, but rather the way the web site allows us to explore the data. Let’s say that I’m a prospective graduate student interested in operating systems and formal verification. The first thing I can do is select only those areas — now the site shows me the departments that have people who tend to publish heavily in conferences such as SOSP and CAV. Second, I can click on an individual department and see who the key players are in those areas. Third, I can go to these people’s home pages, Google Scholar pages, etc. in order to see what they are specifically doing, and finally I can read their code and papers. I would argue that this is a fundamentally different use of a ranking system: the purpose is to guide me towards details that matter, not to hide everything behind a number.

In summary, I find the complaints about the US News rankings to be a bit off the mark, since even a fixed version of them will provide no insights and no information beyond an opaque ordering. It would just be confirming the status quo instead of refuting it, as their current rankings do. That is what some people want, but it is of little use to faculty and students in the field. A better use of rankings is to serve as a guide for further exploration — for this to happen, the rankings need to be open and connected to more detailed sources of information. CSRankings accomplishes this and it is the tool you should use to explore the productivity of computer science departments. If you don’t like it, you can try to convince Emery to do things differently or else create your own ranking.

The Dreaded Practice Talk

[I wrote a post with the same title in 2010; this is an updated version.]

In a week you’ll be giving a talk about your work to 600 people at a conference, or perhaps to five people who will sign off (or not) on your thesis. Depending on your area and the type of talk, the questions following the talk may not be very friendly. What should you do? Practice, practice, practice.

A practice talk is usually given to a small audience anywhere between a few weeks and a few hours before an important talk. It is followed by a feedback session that can easily last five times longer than the talk itself did. Often, multiple practice talks are necessary before the presentation becomes really polished and good.

This post is about getting maximum benefit from a practice talk — this is important because they are very time consuming.

The speaker needs to:

  • Have a legible slide number on every slide. If these aren’t there, people taking notes can’t easily refer back to specific slides later on.
  • Reserve a room, acquire a projector, and have everything setup and ready to go at the arranged time. Have all of the adapter dongles that you need on hand. If anyone is calling in remotely, this should also be taken care of by the speaker or by someone who has agreed to help the speaker, and it needs to be done before the talk is scheduled to start.
  • Have practiced the talk alone first. It helps to have memorized what to say when transitioning between slides. Memorizing an entire talk is usually overkill. Focus on transitions and on getting the talk started smoothly; most of us have a much easier time continuing to talk about a topic than getting started.
  • Have an appropriate number of slides. Speakers vary widely in terms of delivery speed and amount of content per slide, but 1.5 to 2 minutes per slide is probably about right. In realistic situations you will be cut off if you exceed your time budget. At proposals and defenses there is usually not a strict time budget, but going over time is strongly frowned upon.
  • Have a pen and paper available to take notes after the talk. You cannot remember 150 detailed suggestions about things to change.
  • Arrange for someone to time the talk. Sometimes it is helpful to get timings on individual slides.
  • Act on the feedback that is given.

Each member of the audience must:

  • Listen to the talk as if it were being given for real. Interrupting the speaker should be handled according to whatever protocol will be in force during the real talk. Generally this means few or no interruptions.
  • Arrive with a pen and paper, or equivalent note-taking gear.
  • Provide detailed feedback in a constructive and respectful fashion.

In my group this is usually the procedure:

  1. I give a bit of context: remind everyone what the speaker needs to accomplish, what kind of background and temperament the audience is likely to have, etc.
  2. I introduce the speaker.
  3. The talk is given, minimizing interruptions to get a good timing estimate.
  4. Starting with students, the audience asks questions as if they had just heard the real version of the talk. The speaker responds accordingly.
  5. Starting with students, the audience makes general comments about the delivery of the talk.
  6. We go through the talk slide by slide, giving feedback and trying to figure out what to add, delete, change around, etc.

Finally, a bit of advice on making slides:

  • Don’t put text too close to the edges of slides; some projection systems crop a bit.
  • Colors often look different when they go through a projector, and low-contrast colors can be completely invisible on a screen. Use a small number of very high-contrast colors. I typically use black on white for almost everything with some bright red or blue for emphasis.
  • Minimize the number of animations.

Undefined Behavior in 2017

This post is jointly authored by Pascal Cuoq and John Regehr.

Recently we’ve heard a few people imply that problems stemming from undefined behaviors (UB) in C and C++ are largely solved due to ubiquitous availability of dynamic checking tools such as ASan, UBSan, MSan, and TSan. We are here to state the obvious — that, despite the many excellent advances in tooling over the last few years, UB-related problems are far from solved — and to look at the current situation in detail.

Valgrind and most of the sanitizers are intended for debugging: emitting friendly diagnostics regarding undefined behaviors that are executed during testing. Tools like this are exceptionally useful and they have helped us progress from a world where almost every nontrivial C and C++ program executed a continuous stream of UB to a world where quite a few important programs seem to be largely UB-free in their most common configurations and use cases.

The problem with dynamic debugging tools is that they don’t do anything to help us to cope with the worst UBs: the ones that we didn’t know how to trigger during testing, but that someone else has figured out how to trigger in deployed software — while exploiting it. The problem reduces to doing good testing, which is hard. Tools like afl-fuzz are great but they barely begin to scratch the surface of large programs that process highly structured inputs.

One way to sidestep problems in testing is to use static UB-detection tools. These are steadily improving, but sound and precise static analysis is not necessarily any easier than achieving good test coverage. Of course the two techniques are attacking the same problem — identifying feasible paths in software — from opposite sides. This problem has always been extremely hard and probably always will be. We’ve written a lot elsewhere about finding UBs via static analysis; in this piece our focus is on dynamic tools.

The other way to work around problems in testing is to use UB mitigation tools: these turn UB into defined behavior in production C and C++, effectively gaining some of the benefits of a safe programming language. The challenge is in engineering mitigation tools that:

  • don’t break our code in any corner cases,
  • have very low overhead,
  • don’t add effective attack surfaces, for example by requiring programs to be linked against a non-hardened runtime library,
  • raise the bar for determined attackers (in contrast, debugging tools can afford to use heuristics that aren’t resistant to adversaries),
  • compose with each other (in contrast, some debugging tools such as ASan and TSan are not compatible, necessitating two runs of the test suite for any project that wants to use both).

Before looking at some individual kinds of UB, let’s review the our goals here. These apply to every C and C++ compiler.

Goal 1: Every UB (yes, all ~200 of them, we’ll give the list towards the end of this post) must either be documented as having some defined behavior, be diagnosed with a fatal compiler error, or else — as a last resort — have a sanitizer that detects that UB at runtime. This should not be controversial, it’s sort of a minimal requirement for developing C and C++ in the modern world where network packets and compiler optimizations are effectively hostile.

Goal 2: Every UB must either be documented as having some defined behavior, be diagnosed with a fatal compiler error, or else have an optional mitigation mechanism that meets the requirements above. This is more difficult; it necessitates, for example, production-grade memory safety. We like to think that this can be achieved in many execution environments. OS kernels and other maximally performance-critical code will need to resort to more difficult technologies such as formal methods.

The rest of this piece will look at the current situation for various classes of undefined behaviors. We’ll start with the big ones.

Spatial Memory Safety Violations

Background: Accessing out-of-bounds storage and even creating pointers to that storage are UB in C and C++. The 1988 Morris Worm gave us an early hint of what the next N years would be like. So far we know that N >= 29, and probably N will end up being about 75.

Debugging: Valgrind and ASan are both excellent debugging tools. For many use cases ASan is the better choice because it has much less overhead. Both tools retain the representation of addresses as 32- or 64-bit values, and reserve forbidden red zones around valid blocks. This is a robust and compatible approach: it interoperates seamlessly with non-instrumented binary libraries and also supports existing code that relies on pointers being convertible to integers.

Valgrind, working from executable code, cannot insert red zones between stack variables because stack layout is implicitly hard-coded in the offsets of instructions that access the stack, and it would be an impossibly ambitious project to remap stack addresses on the fly. As a result, Valgrind has only limited support for detecting errors in manipulating storage on the stack. ASan works during compilation and inserts red zones around stack variables. Stack variables are small and numerous, so address space and locality considerations prevent the use of very large red zones. With default settings, the addresses of two adjacent local int variables x and y end up separated by 16 bytes. In other words, the verifications done by ASan and Valgrind are only for one memory layout, and the memory layout for which the verifications are done is different from the memory layout of the uninstrumented execution.

A minor weakness of ASan and Valgrind is that they can miss undefined behaviors that get optimized away before the instrumentation has a chance to run, as in this example.

Mitigation: We’ve long had partial mitigation mechanisms for memory unsafety, including ASLR, stack canaries, hardened allocators, and NX. More recently, production-grade CFI (control flow integrity) has become available. Another interesting recent development is pointer authentication in ARMv8.3. This paper has a good overview of memory safety mitigations.

A serious drawback of ASan as a mitigation tool is illustrated here:

$ cat asan-defeat.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char a[128];
char b[128];

int main(int argc, char *argv[]) {
  strcpy(a + atoi(argv[1]), "owned.");
  printf("%s\n", b);
  return 0;
}
$ clang-4.0 -O asan-defeat.c
$ ./a.out 128
owned.
$ clang-4.0 -O -fsanitize=address -fno-common asan-defeat.c
$ ./a.out 160
owned.
$ 

In other words, ASan simply forces an attacker to compute a different offset in order to corrupt a target memory region. (Thanks to Yury Gribov for pointing out that we should be using the -fno-common flag to ASan.)

To mitigate this kind of undefined behavior, real bounds checking must be performed, as opposed to only verifying that each memory access lands in some valid region. Memory safety is the gold standard here. Although there is much academic work on memory safety, some showing apparently reasonable overheads and good compatibility with existing software, it has not yet seen widespread adoption. Checked C is a very cool project to keep an eye on in this space.

Summary: Debugging tools for this class of error are very good. Good mitigations are available but this class of bug can only be reliably stopped by full memory/type safety.

Temporal Memory Safety Violations

Background: A “temporal memory safety violation” is any use of a memory location after its lifetime has ended. This includes addresses of automatic variables outliving these variables; use-after-free, where a dangling pointer is accessed for reading or writing; and, double free, which can be just as harmful in practice, since free() modifies metadata that is usually adjacent to the block being freed. If the block has already been freed, these writes can fall on memory used for any other purpose and, in principle, can have as much consequence as any other invalid write.

Debugging: ASan is designed to detect use-after-free bugs, which often lead to hard-to-reproduce, erratic behavior. It does so by placing freed memory blocks in a quarantine, preventing their immediate reuse. For some programs and inputs, this can increase memory consumption and decrease locality. The user can configure the size of the quarantine in order to trade false positives for resource usage.

ASan can also detect addresses of automatic variables surviving the scope of these variables. The idea is to turn automatic variables into heap-allocated blocks, that the compiler automatically allocates when execution enters the block, and frees (while retaining them in a quarantine) when execution leaves the block. This option is turned off by default, because it makes programs even more memory-hungry.

The temporal memory safety violation in the program below causes it to behave differently at the default optimization level and at -O2. ASan can detect a problem in the program below with no optimization, but only if the option detect_stack_use_after_return is set, and only if the program was not compiled with optimization.

$ cat temporal.c
#include <stdio.h>

int *G;

int f(void) {
  int l = 1;
  int res = *G;
  G = &l;
  return res;
}

int main(void) {
  int x = 2;
  G = &x;
  f();
  printf("%d\n", f());
}
$ clang -Wall -fsanitize=address temporal.c
$ ./a.out 
1
$ ASAN_OPTIONS=detect_stack_use_after_return=1 ./a.out 
=================================================================
==5425==ERROR: AddressSanitizer: stack-use-after-return ...
READ of size 4 at 0x0001035b6060 thread T0
^C
$ clang -Wall -fsanitize=address -O2 temporal.c
$ ./a.out 
32767
$ ASAN_OPTIONS=detect_stack_use_after_return=1 ./a.out 
32767
$ clang -v
Apple LLVM version 8.0.0 (clang-800.0.42.1)
...

In some other examples, the sanitizer’s failure to detect UB that has been “optimized out” can be argued to be harmless, since the optimized-out UB has no consequence. This is not the case here! The program is meaningless in any case, but the unoptimized program behaves deterministically and works as if the variable x had been declared static, whereas the optimized program, in which ASan does not detect any foul play, does not behave deterministically and reveals an internal state that is not supposed to be seen:

$ clang -Wall -O2 temporal.c
$ ./a.out 
1620344886
$ ./a.out 
1734516790
$ ./a.out 
1777709110

Mitigation: As discussed above, ASan is not intended for hardening, but various hardened allocators are available; they use the same quarantining strategy to render use-after-free bugs unexploitable.

Summary: Use ASan (together with “ASAN_OPTIONS=detect_stack_use_after_return=1” for the test cases that are small enough to allow it). Vary optimization levels in case some compilations catch errors that others don’t.

Integer Overflow

Background: Integers cannot underflow, but they can overflow in both directions. Signed integer overflow is UB; this includes INT_MIN / -1, INT_MIN % -1, negating INT_MIN, shift with negative exponent, left-shifting a one past the sign bit, and (sometimes) left-shifting a one into the sign bit. Division by zero and shift by >= bitwidth are UB in both the signed and unsigned flavors. Read more here.

Debugging: LLVM’s UBSan is very good for debugging integer-related undefined behaviors. Because UBSan works near the source level, it is highly reliable. There are some quirks relating to compile-time math; for example, this program traps as C++11 but not as C11; we believe this follows the standards but haven’t looked into it closely. GCC has its own version of UBSan but it isn’t 100% trustworthy; here it looks like constants are being folded before the instrumentation pass gets to run.

Mitigation: UBSan in trapping mode (on hitting UB, process aborts w/o printing a diagnostic) can be used for mitigation. It is usually reasonably efficient and it doesn’t add attack surface. Parts of Android use UBSan to mitigate integer overflows (including unsigned overflows, which of course are not undefined). Although integer overflows are generic logic errors, in C and C++ they are particularly harmful because they often lead to memory safety violations. In a memory-safe language they tend to do much less damage.

Summary: Integer undefined behaviors are not very difficult to catch; UBSan is the only debugging tool you’re likely to ever need. An issue with mitigating integer UBs is the overhead. For example, they cause SPEC CPU 2006 to run about 30% slower. There is plenty of room for improvement, both in eliminating overflow checks that cannot fire and in making the remaining checks less obstructive to the loop optimizers. Someone with resources should push on this.

Strict Aliasing Violations

Background: The “strict aliasing rules” in the C and C++ standards allow the compiler to assume that if two pointers refer to different types, they cannot point to the same storage. This enables nice optimizations but risks breaking programs that take a flexible view of types (roughly 100% of large C and C++ programs take a flexible view of types somewhere). For a thorough overview see Sections 1-3 of this paper.

Debugging: The state of the art in debugging tools for strict aliasing violations is weak. Compilers warn about some easy cases, but these warnings are extremely fragile. libcrunch warns that a pointer is being converted to a type “pointer to thing” when the pointed object is not, in fact, a “thing.” This allows polymorphism though void pointers, but catches misuses of pointer conversions that are also strict aliasing violations. With respect to the C standard and C compilers’ interpretation of what it allows them to optimize in their type-based alias analyses, however, libcrunch is neither sound (it does not detect some violations that happen during the instrumented execution) nor complete (it warns about pointer conversions that smell bad but do not violate the standard).

Mitigation: This is easy: pass the compiler a flag (-fno-strict-aliasing) that disables optimizations based on strict aliasing. The result is a C/C++ compiler that has an old-school memory model where more or less arbitrary casts between pointer types can be performed, with the resulting code behaving as expected. Of the big three compilers, it is only LLVM and GCC that are affected, MSVC doesn’t implement this class of optimization in the first place.

Summary: Correctness-sensitive code bases need significant auditing: it is always suspicious and dangerous to cast a pointer to any type other than a char *. Alternatively, just turn off strict-aliasing-based optimizations using a flag and make sure that nobody ever builds the code without using this flag.

Alignment Violations

Background: RISC-style processors have tended to disallow memory accesses where the address is not a multiple of the size of the object being accessed. On the other hand, C and C++ programs that use unaligned pointers are undefined regardless of the target architecture. Historically we have been complacent about this, first because x86/x64 support unaligned accesses and second because compilers have so far not done much to exploit this UB.

Even so, here is an excellent blog post explaining how the compiler can break code that does unaligned accesses when targeting x64. The code in the post violates strict aliasing in addition to violating the alignment rules, but the crash (we verified it under GCC 7.1.0 on OS X) occurs even when the -fno-strict-aliasing flag is passed to the compiler.

Debugging: UBSan can detect misaligned memory accesses.

Mitigation: None known.

Summary: Use UBSan.

Loops that Neither Perform I/O nor Terminate

Background: A loop in C or C++ code that neither performs I/O nor terminates is undefined and can be terminated arbitrarily by the compiler. See this post and this note.

Debugging: No tools exist.

Mitigation: None, besides avoiding heavily-optimizing compilers.

Summary: This UB is probably not a problem in practice (even if it is moderately displeasing to some of us).

Data Races

Background: A data race occurs when a piece of memory is accessed by more than one thread, at least one of the accesses is a store, and the accesses are not synchronized using a mechanism such as a lock. Data races are UB in modern flavors of C and C++ (they do not have a semantics in older versions since those standards do not address multithreaded code).

Debugging: TSan is an excellent dynamic data race detector. Other similar tools exist, such as the Helgrind plugin for Valgrind, but we have not used these lately. The use of dynamic race detectors is complicated by the fact that races can be very difficult to trigger, and worse this difficulty depends on variables such as the number of cores, the thread scheduling algorithm, whatever else is going on on the test machine, and on the moon’s phase.

Mitigation: Don’t create threads.

Summary: This particular UB is probably a good idea: it clearly communicates the idea that developers should not count on racy code doing anything in particular, but should rather use atomics (that cannot race by definition) if they don’t enjoy locking.

Unsequenced Modifications

Background: In C, “sequence points” constrain how early or late a side-effecting expression such as x++ can take effect. C++ has a different but more-or-less-equivalent formulation of these rules. In either language, unsequenced modifications of the same value, or an unsequenced modification and use of the same value, results in UB.

Debugging: Some compilers emit warnings for obvious violations of the sequencing rules:

$ cat unsequenced2.c
int a;

int foo(void) {
  return a++ - a++;
}
$ clang -c unsequenced2.c
unsequenced2.c:4:11: warning: multiple unsequenced modifications to 'a' [-Wunsequenced]
  return a++ - a++;
          ^     ~~
1 warning generated.
$ gcc-7 -c unsequenced2.c -Wall
unsequenced2.c: In function 'foo':
unsequenced2.c:4:11: warning: operation on 'a' may be undefined [-Wsequence-point]
   return a++ - a++;
          ~^~

However, a bit of indirection defeats these warnings:

$ cat unsequenced.c
#include <stdio.h>

int main(void) {
  int z = 0, *p = &z;
  *p += z++;
  printf("%d\n", z);
  return 0;
}
$ gcc-4.8 -Wall unsequenced.c ; ./a.out
0
$ gcc-7 -Wall unsequenced.c ; ./a.out
1
$ clang -Wall unsequenced.c ; ./a.out
1

Mitigation: None known, though it would be almost trivial to define the order in which side effects take place. The Java Language Definition provides an example of how to do this. We have a hard time believing that this kind of constraint would meaningfully handicap any modern optimizing compiler. If the standards committees can’t find it within their hearts to make this happen, the compiler implementors should do it anyway. Ideally, all major compilers would make the same choice.

Summary: With a bit of practice, it is not too difficult to spot the potential for unsequenced accesses during code reviews. We should be wary of any overly-complex expression that has many side effects. This leaves us without a good story for legacy code, but hey it has worked until now, so perhaps there’s no problem. But really, this should be fixed in the compilers.

A non-UB relative of unsequenced is “indeterminately sequenced” where operations may happen in an order chosen by the compiler. An example is the order of the first two function calls while evaluating f(a(), b()). This order should be specified too. Left-to-right would work. Again, there will be no performance loss in non-insane circumstances.

TIS Interpreter

We now change gears and take a look at the approach taken by TIS Interpreter, a debugging tool that looks for undefined behavior in C programs as it executes them line by line. TIS Interpreter runs programs much more slowly than the LLVM-based sanitizers, and even much more slowly than Valgrind. However, TIS Interpreter can usefully be compared to these sanitizers: it works from the source code, leaves the problem of coverage to test suites and fuzzing tools, and identifies problems along the execution paths that it has been provided inputs for.

A fundamental difference between TIS Interpreter and any single sanitizer is that TIS Interpreter’s goal is, along the execution paths it explores, to be exhaustive: to find all the problems that ASan, MSan, and UBSan are designed to find some of (give or take a couple of minor exceptions that we would be delighted to discuss at great length if provoked). For example, TIS Interpreter identifies unsequenced changes to overlapping memory zones within an expression, such as (*p)++ + (*q)++ when the pointers p and q alias. The problem of the unspecified order of function calls in a same expression, that TIS Interpreter orders without warning when a different order could produce a different result, is a known limitation that will eventually be fixed.

TIS Interpreter’s approach to detecting memory safety errors differs sharply from ASan’s and Valgrind’s in that it doesn’t find errors for a specific heap layout, but rather treats as an error any construct that could lead the execution to behave differently depending on memory layout choices. In other words, TIS Interpreter has a symbolic view of addresses, as opposed to the concrete view taken by Valgrind and ASan. This design choice eliminates the “only the instrumented version of the program is safe, and the instrumented version behaves differently from the deployed version” problem. The occasional C program is written to behave differently depending on the memory layout (for instance if addresses are fed to hash functions or used to provide a total ordering between allocated values). TIS Analyzer warns that these programs are doing this (which is always good to know); sometimes, tweaks make it possible to analyze them in TIS Interpreter anyway, but the resulting guarantees will be weaker.

It is sometimes useful, for debugging purposes, to see the first UB that occurs in an execution. Consider a loop in which MSan warns that uninitialized memory is being used, and in which ASan warns about an out-of-bounds read. Is the out-of-bounds read caused by the incorporation of uninitialized memory in the computation of the index, or is the use of uninitialized memory caused by the index being computed wrongly? One cannot use both ASan and MSan at the same time, so this is a mystery that developers need to solve for themselves. The value of looking for all undefined behaviors at the same time is in this case the confidence that the first undefined behavior seen is not a symptom of a previous undefined behavior. Another advantage is finding undefined behavior that one was not looking for.

Detection of strict aliasing violations in TIS Interpreter is being worked on, following as much as possible the C standard and the interpretation of C compiler designers (which can be observed in each compiler’s translation of well-chosen examples).

But What About the Rest of the Undefined Behaviors?

Let’s take a quick look at the contents of Appendix J.2: a non-normative, non-exhaustive list of undefined behaviors in C. Keep in mind that no equivalent list has ever been created for C++, as far as we know.

First, we’ll list the UBs that we’ve discussed explicitly in this post:

  • The execution of a program contains a data race (5.1.2.4).
  • An object is referred to outside of its lifetime (6.2.4).
  • The value of a pointer to an object whose lifetime has ended is used (6.2.4).
  • The value of an object with automatic storage duration is used while it is indeterminate (6.2.4, 6.7.9, 6.8).
  • Conversion to or from an integer type produces a value outside the range that can be represented (6.3.1.4).
  • An lvalue does not designate an object when evaluated (6.3.2.1).
  • Conversion between two pointer types produces a result that is incorrectly aligned (6.3.2.3).
  • A side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object (6.5).
  • An exceptional condition occurs during the evaluation of an expression (6.5).
  • An object has its stored value accessed other than by an lvalue of an allowable type (6.5).
  • The operand of the unary * operator has an invalid value (6.5.3.2).
  • The value of the second operand of the / or % operator is zero (6.5.5).
  • Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that does not point into, or just beyond, the same array object (6.5.6).
  • Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that points just beyond the array object and is used as the operand of a unary * operator that is evaluated (6.5.6).
  • Pointers that do not point into, or just beyond, the same array object are subtracted (6.5.6).
  • An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).
  • The result of subtracting two pointers is not representable in an object of type ptrdiff_t (6.5.6).
  • An expression is shifted by a negative number or by an amount greater than or equal to the width of the promoted expression (6.5.7).
  • An expression having signed promoted type is left-shifted and either the value of the expression is negative or the result of shifting would be not be representable in the promoted type (6.5.7).
  • Pointers that do not point to the same aggregate or union (nor just beyond the same array object) are compared using relational operators (6.5.8).
  • An object is assigned to an inexactly overlapping object or to an exactly overlapping object with incompatible type (6.5.16.1).

And second, those that we have not addressed:

  • A ‘‘shall” or ‘‘shall not” requirement that appears outside of a constraint is violated (clause 4).
  • A nonempty source file does not end in a new-line character which is not immediately preceded by a backslash character or ends in a partial preprocessing token or comment (5.1.1.2).
  • Token concatenation produces a character sequence matching the syntax of a universal character name (5.1.1.2).
  • A program in a hosted environment does not define a function named main using one of the specified forms (5.1.2.2.1).
  • A character not in the basic source character set is encountered in a source file, except in an identifier, a character constant, a string literal, a header name, a comment, or a preprocessing token that is never converted to a token (5.2.1).
  • An identifier, comment, string literal, character constant, or header name contains an invalid multibyte character or does not begin and end in the initial shift state (5.2.1.2).
  • The same identifier has both internal and external linkage in the same translation unit (6.2.2).
  • A trap representation is read by an lvalue expression that does not have character type (6.2.6.1).
  • A trap representation is produced by a side effect that modifies any part of the object using an lvalue expression that does not have character type (6.2.6.1).
  • The operands to certain operators are such that they could produce a negative zero result, but the implementation does not support negative zeros (6.2.6.2).
  • Two declarations of the same object or function specify types that are not compatible (6.2.7).
  • A program requires the formation of a composite type from a variable length array type whose size is specified by an expression that is not evaluated (6.2.7).
  • Demotion of one real floating type to another produces a value outside the range that can be represented (6.3.1.5).
  • A non-array lvalue with an incomplete type is used in a context that requires the value of the designated object (6.3.2.1).
  • An lvalue designating an object of automatic storage duration that could have been declared with the register storage class is used in a context that requires the value of the designated object, but the object is uninitialized. (6.3.2.1).
  • An lvalue having array type is converted to a pointer to the initial element of the array, and the array object has register storage class (6.3.2.1).
  • An attempt is made to use the value of a void expression, or an implicit or explicit conversion (except to void) is applied to a void expression (6.3.2.2).
  • Conversion of a pointer to an integer type produces a value outside the range that can be represented (6.3.2.3).
  • A pointer is used to call a function whose type is not compatible with the referenced type (6.3.2.3).
  • An unmatched ‘ or ” character is encountered on a logical source line during tokenization (6.4).
  • A reserved keyword token is used in translation phase 7 or 8 for some purpose other than as a keyword (6.4.1).
  • A universal character name in an identifier does not designate a character whose encoding falls into one of the specified ranges (6.4.2.1).
  • The initial character of an identifier is a universal character name designating a digit (6.4.2.1).
  • Two identifiers differ only in nonsignificant characters (6.4.2.1).
  • The identifier __func__ is explicitly declared (6.4.2.2).
  • The program attempts to modify a string literal (6.4.5).
  • The characters ‘, \, “, //, or /* occur in the sequence between the < and > delimiters, or the characters ‘, \, //, or /* occur in the sequence between the ” delimiters, in a header name preprocessing token (6.4.7).
  • For a call to a function without a function prototype in scope, the number of ∗ arguments does not equal the number of parameters (6.5.2.2).
  • For call to a function without a function prototype in scope where the function is defined with a function prototype, either the prototype ends with an ellipsis or the types of the arguments after promotion are not compatible with the types of the parameters (6.5.2.2).
  • For a call to a function without a function prototype in scope where the function is not defined with a function prototype, the types of the arguments after promotion are not compatible with those of the parameters after promotion (with certain exceptions) (6.5.2.2).
  • A function is defined with a type that is not compatible with the type (of the expression) pointed to by the expression that denotes the called function (6.5.2.2).
  • A member of an atomic structure or union is accessed (6.5.2.3).
  • A pointer is converted to other than an integer or pointer type (6.5.4).
  • An expression that is required to be an integer constant expression does not have an integer type; has operands that are not integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, or immediately-cast floating constants; or contains casts (outside operands to sizeof operators) other than conversions of arithmetic types to integer types (6.6).
  • A constant expression in an initializer is not, or does not evaluate to, one of the following: an arithmetic constant expression, a null pointer constant, an address constant, or an address constant for a complete object type plus or minus an integer constant expression (6.6).
  • An arithmetic constant expression does not have arithmetic type; has operands that are not integer constants, floating constants, enumeration constants, character constants, or sizeof expressions; or contains casts (outside operands to size operators) other than conversions of arithmetic types to arithmetic types (6.6).
  • The value of an object is accessed by an array-subscript [], member-access . or −>, address &, or indirection * operator or a pointer cast in creating an address constant (6.6).
  • An identifier for an object is declared with no linkage and the type of the object is incomplete after its declarator, or after its init-declarator if it has an initializer (6.7).
  • A function is declared at block scope with an explicit storage-class specifier other than extern (6.7.1).
  • A structure or union is defined as containing no named members, no anonymous structures, and no anonymous unions (6.7.2.1).
  • An attempt is made to access, or generate a pointer to just past, a flexible array member of a structure when the referenced object provides no elements for that array (6.7.2.1).
  • When the complete type is needed, an incomplete structure or union type is not completed in the same scope by another declaration of the tag that defines the content (6.7.2.3).
  • An attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type (6.7.3).
  • An attempt is made to refer to an object defined with a volatile-qualified type through use of an lvalue with non-volatile-qualified type (6.7.3).
  • The specification of a function type includes any type qualifiers (6.7.3).
  • Two qualified types that are required to be compatible do not have the identically qualified version of a compatible type (6.7.3).
  • An object which has been modified is accessed through a restrict-qualified pointer to a const-qualified type, or through a restrict-qualified pointer and another pointer that are not both based on the same object (6.7.3.1).
  • A restrict-qualified pointer is assigned a value based on another restricted pointer whose associated block neither began execution before the block associated with this pointer, nor ended before the assignment (6.7.3.1).
  • A function with external linkage is declared with an inline function specifier, but is not also defined in the same translation unit (6.7.4).
  • A function declared with a _Noreturn function specifier returns to its caller (6.7.4).
  • The definition of an object has an alignment specifier and another declaration of that object has a different alignment specifier (6.7.5).
  • Declarations of an object in different translation units have different alignment specifiers (6.7.5).
  • Two pointer types that are required to be compatible are not identically qualified, or are not pointers to compatible types (6.7.6.1).
  • The size expression in an array declaration is not a constant expression and evaluates at program execution time to a nonpositive value (6.7.6.2).
  • In a context requiring two array types to be compatible, they do not have compatible element types, or their size specifiers evaluate to unequal values (6.7.6.2).
  • A declaration of an array parameter includes the keyword static within the [ and ] and the corresponding argument does not provide access to the first element of an array with at least the specified number of elements (6.7.6.3).
  • A storage-class specifier or type qualifier modifies the keyword void as a function parameter type list (6.7.6.3).
  • In a context requiring two function types to be compatible, they do not have compatible return types, or their parameters disagree in use of the ellipsis terminator or the number and type of parameters (after default argument promotion, when there is no parameter type list or when one type is specified by a function definition with an identifier list) (6.7.6.3).
  • The value of an unnamed member of a structure or union is used (6.7.9).
  • The initializer for a scalar is neither a single expression nor a single expression enclosed in braces (6.7.9).
  • The initializer for a structure or union object that has automatic storage duration is neither an initializer list nor a single expression that has compatible structure or union type (6.7.9).
  • The initializer for an aggregate or union, other than an array initialized by a string literal, is not a brace-enclosed list of initializers for its elements or members (6.7.9).
  • An identifier with external linkage is used, but in the program there does not exist exactly one external definition for the identifier, or the identifier is not used and there exist multiple external definitions for the identifier (6.9).
  • A function definition includes an identifier list, but the types of the parameters are not declared in a following declaration list (6.9.1).
  • An adjusted parameter type in a function definition is not a complete object type (6.9.1).
  • A function that accepts a variable number of arguments is defined without a parameter type list that ends with the ellipsis notation (6.9.1).
  • The } that terminates a function is reached, and the value of the function call is used by the caller (6.9.1).
  • An identifier for an object with internal linkage and an incomplete type is declared with a tentative definition (6.9.2).
  • The token defined is generated during the expansion of a #if or #elif preprocessing directive, or the use of the defined unary operator does not match one of the two specified forms prior to macro replacement (6.10.1).
  • The #include preprocessing directive that results after expansion does not match one of the two header name forms (6.10.2).
  • The character sequence in an #include preprocessing directive does not start with a letter (6.10.2).
  • There are sequences of preprocessing tokens within the list of macro arguments that would otherwise act as preprocessing directives (6.10.3).
  • The result of the preprocessing operator # is not a valid character string literal (6.10.3.2).
  • The result of the preprocessing operator ## is not a valid preprocessing token (6.10.3.3).
  • The #line preprocessing directive that results after expansion does not match one of the two well-defined forms, or its digit sequence specifies zero or a number greater than 2147483647 (6.10.4).
  • A non-STDC #pragma preprocessing directive that is documented as causing translation failure or some other form of undefined behavior is encountered (6.10.6).
  • A #pragma STDC preprocessing directive does not match one of the well-defined forms (6.10.6).
  • The name of a predefined macro, or the identifier defined, is the subject of a #define or #undef preprocessing directive (6.10.8).
  • An attempt is made to copy an object to an overlapping object by use of a library function, other than as explicitly allowed (e.g., memmove) (clause 7).
  • A file with the same name as one of the standard headers, not provided as part of the implementation, is placed in any of the standard places that are searched for included source files (7.1.2).
  • A header is included within an external declaration or definition (7.1.2).
  • A function, object, type, or macro that is specified as being declared or defined by some standard header is used before any header that declares or defines it is included (7.1.2).
  • A standard header is included while a macro is defined with the same name as a keyword (7.1.2).
  • The program attempts to declare a library function itself, rather than via a standard header, but the declaration does not have external linkage (7.1.2).
  • The program declares or defines a reserved identifier, other than as allowed by 7.1.4 (7.1.3).
  • The program removes the definition of a macro whose name begins with an underscore and either an uppercase letter or another underscore (7.1.3).
  • An argument to a library function has an invalid value or a type not expected by a function with variable number of arguments (7.1.4).
  • The pointer passed to a library function array parameter does not have a value such that all address computations and object accesses are valid (7.1.4).
  • The macro definition of assert is suppressed in order to access an actual function (7.2).
  • The argument to the assert macro does not have a scalar type (7.2).
  • The CX_LIMITED_RANGE, FENV_ACCESS, or FP_CONTRACT pragma is used in any context other than outside all external declarations or preceding all explicit declarations and statements inside a compound statement (7.3.4, 7.6.1, 7.12.2).
  • The value of an argument to a character handling function is neither equal to the value of EOF nor representable as an unsigned char (7.4).
  • A macro definition of errno is suppressed in order to access an actual object, or the program defines an identifier with the name errno (7.5).
  • Part of the program tests floating-point status flags, sets floating-point control modes, or runs under non-default mode settings, but was translated with the state for the FENV_ACCESS pragma ‘‘off” (7.6.1).
  • The exception-mask argument for one of the functions that provide access to the floating-point status flags has a nonzero value not obtained by bitwise OR of the floating-point exception macros (7.6.2).
  • The fesetexceptflag function is used to set floating-point status flags that were not specified in the call to the fegetexceptflag function that provided the value of the corresponding fexcept_t object (7.6.2.4).
  • The argument to fesetenv or feupdateenv is neither an object set by a call to fegetenv or feholdexcept, nor is it an environment macro (7.6.4.3, 7.6.4.4).
  • The value of the result of an integer arithmetic or conversion function cannot be represented (7.8.2.1, 7.8.2.2, 7.8.2.3, 7.8.2.4, 7.22.6.1, 7.22.6.2, 7.22.1).
  • The program modifies the string pointed to by the value returned by the setlocale function (7.11.1.1).
  • The program modifies the structure pointed to by the value returned by the localeconv function (7.11.2.1).
  • A macro definition of math_errhandling is suppressed or the program defines an identifier with the name math_errhandling (7.12).
  • An argument to a floating-point classification or comparison macro is not of real floating type (7.12.3, 7.12.14).
  • A macro definition of setjmp is suppressed in order to access an actual function, or the program defines an external identifier with the name setjmp (7.13).
  • An invocation of the setjmp macro occurs other than in an allowed context (7.13.2.1).
  • The longjmp function is invoked to restore a nonexistent environment (7.13.2.1).
  • After a longjmp, there is an attempt to access the value of an object of automatic storage duration that does not have volatile-qualified type, local to the function containing the invocation of the corresponding setjmp macro, that was changed between the setjmp invocation and longjmp call (7.13.2.1).
  • The program specifies an invalid pointer to a signal handler function (7.14.1.1).
  • A signal handler returns when the signal corresponded to a computational exception (7.14.1.1).
  • A signal occurs as the result of calling the abort or raise function, and the signal handler calls the raise function (7.14.1.1).
  • A signal occurs other than as the result of calling the abort or raise function, and the signal handler refers to an object with static or thread storage duration that is not a lock-free atomic object other than by assigning a value to an object declared as volatile sig_atomic_t, or calls any function in the standard library other than the abort function, the _Exit function, the quick_exit function, or the signal function (for the same signal number) (7.14.1.1).
  • The value of errno is referred to after a signal occurred other than as the result of calling the abort or raise function and the corresponding signal handler obtained a SIG_ERR return from a call to the signal function (7.14.1.1).
  • A signal is generated by an asynchronous signal handler (7.14.1.1).
  • A function with a variable number of arguments attempts to access its varying arguments other than through a properly declared and initialized va_list object, or before the va_start macro is invoked (7.16, 7.16.1.1, 7.16.1.4).
  • The macro va_arg is invoked using the parameter ap that was passed to a function that invoked the macro va_arg with the same parameter (7.16).
  • A macro definition of va_start, va_arg, va_copy, or va_end is suppressed in order to access an actual function, or the program defines an external identifier with the name va_copy or va_end (7.16.1).
  • The va_start or va_copy macro is invoked without a corresponding invocation of the va_end macro in the same function, or vice versa (7.16.1, 7.16.1.2, 7.16.1.3, 7.16.1.4).
  • The type parameter to the va_arg macro is not such that a pointer to an object of that type can be obtained simply by postfixing a * (7.16.1.1).
  • The va_arg macro is invoked when there is no actual next argument, or with a specified type that is not compatible with the promoted type of the actual next argument, with certain exceptions (7.16.1.1).
  • The va_copy or va_start macro is called to initialize a va_list that was previously initialized by either macro without an intervening invocation of the va_end macro for the same va_list (7.16.1.2, 7.16.1.4).
  • The parameter parmN of a va_start macro is declared with the register storage class, with a function or array type, or with a type that is not compatible with the type that results after application of the default argument promotions (7.16.1.4).
  • The member designator parameter of an offsetof macro is an invalid right operand of the . operator for the type parameter, or designates a bit-field (7.19).
  • The argument in an instance of one of the integer-constant macros is not a decimal, octal, or hexadecimal constant, or it has a value that exceeds the limits for the corresponding type (7.20.4).
  • A byte input/output function is applied to a wide-oriented stream, or a wide character input/output function is applied to a byte-oriented stream (7.21.2).
  • Use is made of any portion of a file beyond the most recent wide character written to a wide-oriented stream (7.21.2).
  • The value of a pointer to a FILE object is used after the associated file is closed (7.21.3).
  • The stream for the fflush function points to an input stream or to an update stream in which the most recent operation was input (7.21.5.2).
  • The string pointed to by the mode argument in a call to the fopen function does not exactly match one of the specified character sequences (7.21.5.3).
  • An output operation on an update stream is followed by an input operation without an intervening call to the fflush function or a file positioning function, or an input operation on an update stream is followed by an output operation with an intervening call to a file positioning function (7.21.5.3).
  • An attempt is made to use the contents of the array that was supplied in a call to the setvbuf function (7.21.5.6).
  • There are insufficient arguments for the format in a call to one of the formatted input/output functions, or an argument does not have an appropriate type (7.21.6.1, 7.21.6.2, 7.28.2.1, 7.28.2.2).
  • The format in a call to one of the formatted input/output functions or to the strftime or wcsftime function is not a valid multibyte character sequence that begins and ends in its initial shift state (7.21.6.1, 7.21.6.2, 7.26.3.5, 7.28.2.1, 7.28.2.2, 7.28.5.1).
  • In a call to one of the formatted output functions, a precision appears with a conversion specifier other than those described (7.21.6.1, 7.28.2.1).
  • A conversion specification for a formatted output function uses an asterisk to denote an argument-supplied field width or precision, but the corresponding argument is not provided (7.21.6.1, 7.28.2.1).
  • A conversion specification for a formatted output function uses a # or 0 flag with a conversion specifier other than those described (7.21.6.1, 7.28.2.1).
  • A conversion specification for one of the formatted input/output functions uses a length modifier with a conversion specifier other than those described (7.21.6.1, 7.21.6.2, 7.28.2.1, 7.28.2.2).
  • An s conversion specifier is encountered by one of the formatted output functions, and the argument is missing the null terminator (unless a precision is specified that does not require null termination) (7.21.6.1, 7.28.2.1).
  • An n conversion specification for one of the formatted input/output functions includes any flags, an assignment-suppressing character, a field width, or a precision (7.21.6.1, 7.21.6.2, 7.28.2.1, 7.28.2.2).
  • A % conversion specifier is encountered by one of the formatted input/output functions, but the complete conversion specification is not exactly %% (7.21.6.1, 7.21.6.2, 7.28.2.1, 7.28.2.2).
  • An inv alid conversion specification is found in the format for one of the formatted input/output functions, or the strftime or wcsftime function (7.21.6.1, 7.21.6.2, 7.26.3.5, 7.28.2.1, 7.28.2.2, 7.28.5.1).
  • The number of characters transmitted by a formatted output function is greater than INT_MAX (7.21.6.1, 7.21.6.3, 7.21.6.8, 7.21.6.10).
  • The result of a conversion by one of the formatted input functions cannot be represented in the corresponding object, or the receiving object does not have an appropriate type (7.21.6.2, 7.28.2.2).
  • A c, s, or [ conversion specifier is encountered by one of the formatted input functions, and the array pointed to by the corresponding argument is not large enough to accept the input sequence (and a null terminator if the conversion specifier is s or [) (7.21.6.2, 7.28.2.2).
  • A c, s, or [ conversion specifier with an l qualifier is encountered by one of the formatted input functions, but the input is not a valid multibyte character sequence that begins in the initial shift state (7.21.6.2, 7.28.2.2).
  • The input item for a %p conversion by one of the formatted input functions is not a value converted earlier during the same program execution (7.21.6.2, 7.28.2.2).
  • The vfprintf, vfscanf, vprintf, vscanf, vsnprintf, vsprintf, vsscanf, vfwprintf, vfwscanf, vswprintf, vswscanf, vwprintf, or vwscanf function is called with an improperly initialized va_list argument, or the argument is used (other than in an invocation of va_end) after the function returns (7.21.6.8, 7.21.6.9, 7.21.6.10, 7.21.6.11, 7.21.6.12, 7.21.6.13, 7.21.6.14, 7.28.2.5, 7.28.2.6, 7.28.2.7, 7.28.2.8, 7.28.2.9, 7.28.2.10).
  • The contents of the array supplied in a call to the fgets or fgetws function are used after a read error occurred (7.21.7.2, 7.28.3.2).
  • The file position indicator for a binary stream is used after a call to the ungetc function where its value was zero before the call (7.21.7.10).
  • The file position indicator for a stream is used after an error occurred during a call to the fread or fwrite function (7.21.8.1, 7.21.8.2).
  • A partial element read by a call to the fread function is used (7.21.8.1).
  • The fseek function is called for a text stream with a nonzero offset and either the offset was not returned by a previous successful call to the ftell function on a stream associated with the same file or whence is not SEEK_SET (7.21.9.2).
  • The fsetpos function is called to set a position that was not returned by a previous successful call to the fgetpos function on a stream associated with the same file (7.21.9.3).
  • A non-null pointer returned by a call to the calloc, malloc, or realloc function with a zero requested size is used to access an object (7.22.3).
  • The value of a pointer that refers to space deallocated by a call to the free or realloc function is used (7.22.3).
  • The alignment requested of the aligned_alloc function is not valid or not supported by the implementation, or the size requested is not an integral multiple of the alignment (7.22.3.1).
  • The pointer argument to the free or realloc function does not match a pointer earlier returned by a memory management function, or the space has been deallocated by a call to free or realloc (7.22.3.3, 7.22.3.5).
  • The value of the object allocated by the malloc function is used (7.22.3.4).
  • The value of any bytes in a new object allocated by the realloc function beyond the size of the old object are used (7.22.3.5).
  • The program calls the exit or quick_exit function more than once, or calls both functions (7.22.4.4, 7.22.4.7).
  • During the call to a function registered with the atexit or at_quick_exit function, a call is made to the longjmp function that would terminate the call to the registered function (7.22.4.4, 7.22.4.7).
  • The string set up by the getenv or strerror function is modified by the program (7.22.4.6, 7.23.6.2).
  • A command is executed through the system function in a way that is documented as causing termination or some other form of undefined behavior (7.22.4.8).
  • A searching or sorting utility function is called with an invalid pointer argument, even if the number of elements is zero (7.22.5).
  • The comparison function called by a searching or sorting utility function alters the contents of the array being searched or sorted, or returns ordering values inconsistently (7.22.5).
  • The array being searched by the bsearch function does not have its elements in proper order (7.22.5.1).
  • The current conversion state is used by a multibyte/wide character conversion function after changing the LC_CTYPE category (7.22.7).
  • A string or wide string utility function is instructed to access an array beyond the end of an object (7.23.1, 7.28.4).
  • A string or wide string utility function is called with an invalid pointer argument, even if the length is zero (7.23.1, 7.28.4).
  • The contents of the destination array are used after a call to the strxfrm, strftime, wcsxfrm, or wcsftime function in which the specified length was too small to hold the entire null-terminated result (7.23.4.5, 7.26.3.5, 7.28.4.4.4, 7.28.5.1).
  • The first argument in the very first call to the strtok or wcstok is a null pointer (7.23.5.8, 7.28.4.5.7).
  • The type of an argument to a type-generic macro is not compatible with the type of the corresponding parameter of the selected function (7.24).
  • A complex argument is supplied for a generic parameter of a type-generic macro that has no corresponding complex function (7.24).
  • At least one field of the broken-down time passed to asctime contains a value outside its normal range, or the calculated year exceeds four digits or is less than the year 1000 (7.26.3.1).
  • The argument corresponding to an s specifier without an l qualifier in a call to the fwprintf function does not point to a valid multibyte character sequence that begins in the initial shift state (7.28.2.11).
  • In a call to the wcstok function, the object pointed to by ptr does not have the value stored by the previous call for the same wide string (7.28.4.5.7).
  • An mbstate_t object is used inappropriately (7.28.6).
  • The value of an argument of type wint_t to a wide character classification or case mapping function is neither equal to the value of WEOF nor representable as a wchar_t (7.29.1).
  • The iswctype function is called using a different LC_CTYPE category from the one in effect for the call to the wctype function that returned the description (7.29.2.2.1).
  • The towctrans function is called using a different LC_CTYPE category from the one in effect for the call to the wctrans function that returned the description (7.29.3.2.1).

Most of these items are already detected, could be detected easily, or would be detected as a side effect of solving UBs that we discussed in detail. In other words, a few basic technologies, such as shadow memory and run-time type information, provide the infrastructure needed to detect a large fraction of the hard-to-detect UBs. Alas it is difficult to make shadow memory and run-time type information fast.

Summary

What is the modern C or C++ developer to do?

  • Be comfortable with the “easy” UB tools — the ones that can usually be enabled just by adjusting a makefile, such as compiler warnings and ASan and UBSan. Use these early and often, and (crucially) act upon their findings.
  • Be familiar with the “hard” UB tools — those such as TIS Interpreter that typically require more effort to run — and use them when appropriate.
  • Invest in broad-based testing (track code coverage, use fuzzers, etc.) in order to get maximum benefit out of dynamic UB detection tools.
  • Perform UB-aware code reviews: build a culture where we collectively diagnose potentially dangerous patches and get them fixed before they land.
  • Be knowledgeable about what’s actually in the C and C++ standards since these are what compiler writers are going by. Avoid repeating tired maxims like “C is a portable assembly language” and “trust the programmer.”

Unfortunately, C and C++ are mostly taught the old way, as if programming in them isn’t like walking in a minefield. Nor have the books about C and C++ caught up with the current reality. These things must change.

Good luck, everyone.

We’d like to thank various people, especially @CopperheadOS on Twitter, for discussing these issues with us.