A Quick Coding Contest: Convert String to Integer Without Overflow

UPDATE: As of Saturday March 9, the contest is closed. Results will be posted in a few days. I’ve created a Github repository containing all submissions and my test harness.

Regular readers will know that I’m obsessed with integer overflows. One apparently simple piece of code that many programmers get wrong in this respect is converting strings to integers. It is slightly tricky. The goal of this contest is to submit the nicest (judged by me) implementation in C for this specification:

/*
 * if input matches ^-?[0-9]+\0$ and the resulting integer is
 * representable as a long, return the integer; otherwise if
 * the input is a null-terminated string, set error to 1 and 
 * return any value; otherwise behavior is undefined
 */
extern int error;
long str2long (const char *);

I deliberately didn’t name the function atol or strtol since its semantics are slightly nonstandard. The regular expression above probably isn’t standard either; it is simply trying to specify a null-terminated string containing only a signed decimal integer of arbitrary size. No other characters (even whitespace) are permitted.

A few extra things:

Don’t call library functions; your code should perform the conversion itself
It is fine to use constants from limits.h
You should write conforming C code that works on x86 and x86-64. I’ll probably only test entries using GCC and Clang but even so you are not allowed to use extensions such as GCC’s 128-bit integers
You may assume that implementation-defined behaviors are the standard ones for the x86 and x86-64 platforms on Linux machines
Your code must not execute undefined behaviors (use a recent Clang to check for this)

Please mail entries to me, not only to avoid spoiling the fun but also because WordPress loves to mangle C code left in the comments. Feel free to leave comments that don’t include code, of course. I’ll announce the winner in a few days, and will send the winner a small prize. By submitting an entry into this contest you are agreeing to let me use your entry in a subsequent blog post, either as a positive or negative example.

NOTE: The best Clang flags to use are: -fsanitize=integer -fno-sanitize=unsigned-integer-overflow

UPDATE: As of Saturday March 9, the contest is closed. Results will be posted in a few days. I’ve created a Github repository containing all submissions and my test harness.

March 5, 2013

regehr

Compilers, Computer Science, Software Correctness

58 responses to “A Quick Coding Contest: Convert String to Integer Without Overflow”

David Eisenstat says:

March 9, 2013 at 3:24 pm

My name links to my would-be entry, which was several hours too late. Probably the part of most interest is the line

return n != 0 ? -1 – (long)(n – 1) : -(long)n;

which computes -n for n in [0, -INT_MIN] with wrapping semantics and compiles under mild optimization with recent versions of GCC and Clang to a single instruction.
David Eisenstat says:

March 9, 2013 at 3:34 pm

Note that n is an unsigned long and that Safari, in its infinite wisdom, has converted two of my minus signs to en dashes.
Robert David Graham says:

March 11, 2013 at 12:57 pm

Ryan,

While I’m quite happy with my submission and don’t want to change it, your performance tests skewed toward how the submissions deal with invalid input rather than valid input.

To showcase this problem (NOT A NEW SUBMISSION) I’ve created a version of my submission that adds two checks. The first check is to early exit when an error occurs. The second skips leadings zeroes — a completely pointless check that hurts performance on normal input, and is only needed when you plan on sending a lot of input with leading zeroes.

I’ve posted this file as “robert_3.c” at:
https://gist.github.com/robertdavidgraham/5136591
Again, this is NOT a new submission. I’m quite happy with my submission, and don’t want to change it. I don’t like redundant error checking, nor do I like the “implicit goto” caused by the return statement (which is why I removed it from my first submission robert.c).

On my machine, the fastest submission is “yang_2.c”, which includes this error checking. My “robert_2.c” is twice as slow, but “robert_3.c” is only 10% slower. I suspect that if you made the same changes to the other submissions, the majority of them would cluster much closer in your tests.
Magnus says:

March 11, 2013 at 8:22 pm

Had a lot of fun doing this — unfortunately the latest version of my entry that was committed to github still had the conditional compilation set up to use MSVC’s #define of LLONG_MAX (which I had to use because longs are only 32 bits even with the 64 bit compiler) which ends in “i64” instead of “LL”, which causes the infinite loop.

The correct version works fine as far as I know with only a single character difference. I sent that to John and he even replied before the 9th so I can prove it!
Meddler says:

March 13, 2013 at 4:16 am

Many versions of itoa() have an overflow bug that makes them print non-digits when given INT_MIN. You don’t wanna see “if (n<0) { n = -n; …}"
berdario says:

March 13, 2013 at 7:08 am

Uh,

https://github.com/regehr/str2long_contest/blob/master/str2long.h

so, I could’ve used strlen after all…

“Donâ€™t call library functions; your code should perform the conversion itself
It is fine to use constants from limits.h”

led me to think that anything outside limits.h was to be avoided… shuck, could’ve saved a few lines probably
Thomas Fach-Pedersen says:

March 14, 2013 at 5:06 am

berdario: No, it’s still against the rules. It’s not like you needed to know the length of the input anyway 😉
berdario says:

March 14, 2013 at 1:25 pm

Uh… well, there’re some submissions that make use of such library functions 🙂