Draft Paper about Integer Overflow


Result of the infamous Pac-Man integer overflow

Last Spring I had a lucky conversation. I was chatting with Vikram Adve, while visiting the University of Illinois, and we realized that we working on very similar projects — figuring out what to do about integer overflow bugs in C and C++ programs. Additionally, Vikram’s student Will and my student Peng had independently created very similar LLVM-based dynamic checking tools for finding these bugs. As a researcher I find duplicated effort to be bad at several levels. First, it’s a waste of time and grant money. Second, as soon as one of the competing groups wins the race to publish their results, the other group is left with a lot of unpublishable work. However, after talking things through, we agreed to collaborate instead of compete. This was definitely a good outcome since the resulting paper — submitted last week — is almost certainly better than what either of the groups would have produced on its own. The point is to take a closer look at integer overflow than had been taken in previous work. This required looking for integer overflows in a lot of real applications and then studying these overflows. It turns out they come in many varieties, and the distinctions between them are very subtle. The paper contains all the gory details. The IOC (integer overflow checker) tool is here. We hope to convince the LLVM developers that IOC should be part of the default LLVM build.

We would be happy to receive feedback about the draft.


13 responses to “Draft Paper about Integer Overflow”

  1. Hi Ahmed- Yes, sorry for any confusion. The other page is now a permanent redirect to the IOC page.

    The name change was necessitated by the fact that the tool has gone beyond only checking for undefined behaviors. Now it checks for:
    – unsigned wraparound
    – value loss via truncation
    – value loss via sign change

    Of course these are all well-defined, but maybe we want to know about them.

    Actually the patch that is there right now is just the undefined behavior checker but we’ll update with the full patch soon.

  2. Thanks. And I suppose this means that it will be much like SafeInt, but for every integer rather than simply some of them that are involved in “important” operations. That seems valuable from a completeness perspective, but less so from a pragmatic perspective, in terms of false positive rates.

  3. Ahmed, yes — there will be quite a few false positives, but only when users ask for them by enabling the extended set of checks.

    It’s not clear what the real answer here is. I’d be interested in discussing this. It’s a pretty hard problem.

    SafeInt is based on the assumption that silent overflow is OK except for special cases where we want checking to occur. This is pragmatic but doesn’t help in cases like this one:

    http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19351

    I’d like to explore a solution where overflow is not OK by default, but developers can perhaps add annotations stating that specific overflow instances are OK. This is probably unacceptable for large existing code bases, but should be a good idea moving forward, and can be retrofitted in security/safety critical legacy codes.

  4. The bug in operator new[] fits within the SafeInt “charter” of: all memory allocation calculations use SafeInt. operator new[] is low level enough that using SafeInt may not be feasible, but your SafeInt usage methodology would definitely require validating that it works and the same with calloc.

    But, I agree with the long term goal of avoiding (accidental) integer overflows entirely. If the programmer isn’t explicitly looking for overflow behavior, then an overflow is probably a logic bug, if not something greater.

  5. C# has the approach you mention today, along with some niceties, like upcasting to 64-bit when dealing with mixed signed-unsigned 32-bit numbers. You can just set a compiler switch to make all int overflows an exception, and then on a per-line basis turn them back off if it is a false positive.

    I think trying to do that with a legacy code base would result in quite a lot of regressions, and probably isn’t practical.

    Speaking of new[], the Microsoft compiler has for the last several versions caused an int overflow internal to new[] to fail. I’d encourage other compiler vendors to adopt the same practice.

  6. Thanks David, I’m not a C# programmer and hadn’t known about this.

    It’s insane that any compiler would permit silent overflow in new[]. Probably not insane from a language lawyer point of view, but insane in a practical sense.

  7. This looks useful. However, I’d like a way to disable certain checks. Specifically, shifts of negative numbers is well-defined with most compilers and lots of code intentionally relies on it working. On such code, the checker as available today produces far too many false positives to be of much use.

  8. Some grammar nitpicks from a first reading:

    pg 1, last paragraph: “because latent until a compiler upgrade…” should probably be “because they are latent” or similar

    pg 6, last paragraph of left column: add comma in “and many_,_ like gcc, …”

    pg 10, last full paragraph: the sentence beginning with “The tools vary from static analysis, …” doesn’t parse; perhaps replace comma with “and”?

  9. Awesome stuff! Can’t wait to try it.

    Do you have patches against llvm/clang svn trunk instead of against 2.9?