A Quick Coding Contest: Convert String to Integer Without Overflow


UPDATE: As of Saturday March 9, the contest is closed. Results will be posted in a few days. I’ve created a Github repository containing all submissions and my test harness.

Regular readers will know that I’m obsessed with integer overflows. One apparently simple piece of code that many programmers get wrong in this respect is converting strings to integers. It is slightly tricky. The goal of this contest is to submit the nicest (judged by me) implementation in C for this specification:

/*
 * if input matches ^-?[0-9]+\0$ and the resulting integer is
 * representable as a long, return the integer; otherwise if
 * the input is a null-terminated string, set error to 1 and 
 * return any value; otherwise behavior is undefined
 */
extern int error;
long str2long (const char *);

I deliberately didn’t name the function atol or strtol since its semantics are slightly nonstandard. The regular expression above probably isn’t standard either; it is simply trying to specify a null-terminated string containing only a signed decimal integer of arbitrary size. No other characters (even whitespace) are permitted.

A few extra things:

  • Don’t call library functions; your code should perform the conversion itself
  • It is fine to use constants from limits.h
  • You should write conforming C code that works on x86 and x86-64. I’ll probably only test entries using GCC and Clang but even so you are not allowed to use extensions such as GCC’s 128-bit integers
  • You may assume that implementation-defined behaviors are the standard ones for the x86 and x86-64 platforms on Linux machines
  • Your code must not execute undefined behaviors (use a recent Clang to check for this)

Please mail entries to me, not only to avoid spoiling the fun but also because WordPress loves to mangle C code left in the comments. Feel free to leave comments that don’t include code, of course. I’ll announce the winner in a few days, and will send the winner a small prize. By submitting an entry into this contest you are agreeing to let me use your entry in a subsequent blog post, either as a positive or negative example.

NOTE: The best Clang flags to use are: -fsanitize=integer -fno-sanitize=unsigned-integer-overflow

UPDATE: As of Saturday March 9, the contest is closed. Results will be posted in a few days. I’ve created a Github repository containing all submissions and my test harness.


58 responses to “A Quick Coding Contest: Convert String to Integer Without Overflow”

  1. My name links to my would-be entry, which was several hours too late. Probably the part of most interest is the line

    return n != 0 ? -1 – (long)(n – 1) : -(long)n;

    which computes -n for n in [0, -INT_MIN] with wrapping semantics and compiles under mild optimization with recent versions of GCC and Clang to a single instruction.

  2. Ryan,

    While I’m quite happy with my submission and don’t want to change it, your performance tests skewed toward how the submissions deal with invalid input rather than valid input.

    To showcase this problem (NOT A NEW SUBMISSION) I’ve created a version of my submission that adds two checks. The first check is to early exit when an error occurs. The second skips leadings zeroes — a completely pointless check that hurts performance on normal input, and is only needed when you plan on sending a lot of input with leading zeroes.

    I’ve posted this file as “robert_3.c” at:
    https://gist.github.com/robertdavidgraham/5136591
    Again, this is NOT a new submission. I’m quite happy with my submission, and don’t want to change it. I don’t like redundant error checking, nor do I like the “implicit goto” caused by the return statement (which is why I removed it from my first submission robert.c).

    On my machine, the fastest submission is “yang_2.c”, which includes this error checking. My “robert_2.c” is twice as slow, but “robert_3.c” is only 10% slower. I suspect that if you made the same changes to the other submissions, the majority of them would cluster much closer in your tests.

  3. Had a lot of fun doing this — unfortunately the latest version of my entry that was committed to github still had the conditional compilation set up to use MSVC’s #define of LLONG_MAX (which I had to use because longs are only 32 bits even with the 64 bit compiler) which ends in “i64” instead of “LL”, which causes the infinite loop.

    The correct version works fine as far as I know with only a single character difference. I sent that to John and he even replied before the 9th so I can prove it!

  4. Many versions of itoa() have an overflow bug that makes them print non-digits when given INT_MIN. You don’t wanna see “if (n<0) { n = -n; …}"

  5. berdario: No, it’s still against the rules. It’s not like you needed to know the length of the input anyway 😉

  6. Uh… well, there’re some submissions that make use of such library functions 🙂