Embedded in Academia

Integer Overflow Paper

My coauthors and I just finished the final version of our paper about integer overflows in C/C++ programs that’s going to appear at ICSE 2012, a software engineering conference. Basically we made a tool for dynamically finding integer overflows (and related integer undefined behaviors) and used it to look at a lot of software. As you might expect, lots of overflows occur.

Our analysis is based on dividing overflows into four kinds:

Intentional, well-defined overflows, such as letting an unsigned integer wrap around in a PRNG. These are not a problem.
Unintentional, well-defined overflows, such as an unsigned multiplication wrapping around when this was not expected to happen. These are logic errors.
Intentional, undefined overflows, such as computing INT_MAX using (1<<31)-1. These are often “time bombs” — behaviors that (may) currently work but are waiting to be broken by improvements in compiler optimization.
Unintentional, undefined overflows, such as letting a signed multiplication overflow when this was not expected to happen. These are logic errors.

The conclusion is that people should at least test for undefined behaviors using a tool like IOC, and probably should also check for well-defined but unexpected overflows.

March 28, 2012

regehr

Computer Science, Software Correctness

2 responses to “Integer Overflow Paper”

binary tools | Pearltrees says:

March 29, 2012 at 11:18 pm

[…] Embedded in Academia : Integer Overflow Paper Intentional, undefined overflows, such as computing INT_MAX using (1 Unintentional, undefined overflows, such as letting a signed multiplication overflow when this was not expected to happen. These are logic errors. Unintentional, well-defined overflows, such as an unsigned multiplication wrapping around when this was not expected to happen. These are logic errors. […]
solrize says:

April 4, 2012 at 3:55 am

CPU vendors should add hardware traps for integer overflow. They already have them for IEEE floating point overflow after all.

Standard ML is the only language I know of that gets this right. I brought up the issue with some Haskell hackers a while back but none of them seemed to care.