str2long Contest Results Part 1

[NOTE: Reading this post only makes sense if you read and cared about this previous post.]

Ok, evaluating the submissions has been more work than I anticipated, and also things have gotten pretty busy at work, so I’m going to split the evaluation of the submissions into two parts. This first part will discuss objective metrics while the second will be subjective.

I received approximately 78 implementations of str2long(), including multiple submissions from a number of people. I say “approximately” since I deleted some submissions where the sender noticed it was buggy before I did. All submissions can be found on Github here. A total of 35 submissions pass all of my tests, which include many functional tests in addition to integer overflow checks using Clang’s -fsanitize=integer. My tests are reasonably good (I hope) but I still could have missed things, let me know if this is the case. I tested all submissions on an x86-64 Linux machine, targeting both 64-bit and 32-bit modes. The nastiest test case that I used is one that allocates a 9GB string that is mostly leading zeros; this caused a few submissions that passed all other tests to fail, of course because they used a 32-bit index somewhere. One might reasonably argue that this is a vanishingly unlikely use case, but it’s not prohibited by the specification so I tested for it.

Besides correctness, I also looked at code size and performance. Neither were explicit criteria for this contest, but all other things being equal faster code is better. Code size is mainly interesting because it serves as a very crude way to measure algorithmic complexity. In other words, code size answers the question: How much complexity remains after an optimizing compiler has looked at the function?

Since compilers are notoriously finicky about what they will and won’t optimize, I used a variety of compilers and report only the best result. The compilers are:

GCC 4.4.7, 4.5.4, 4.6.3, 4.7.2, and a pre-4.8 snapshot (r196703, compiled from SVN a few days ago)
Clang 3.0, 3.1, 3.2, and a pre-3.3 snapshot (r177225, compiled from SVN a few days ago)
Intel CC 12.0.5 and 13.1.0

Each compiler targeted x86-64 and was invoked at -O0, -O1, -O2, -Os, and -O3. I’m aware there are other flags controlling code generation but I don’t have time to play those games. Code size was measured using the size command after passing the compiler flags -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables which are necessary to prevent extraneous code from being generated and counted. With these preliminaries aside, here are the size results:

bytes	compiler	submission
96	icc-13.0.1 -O1	libc
109	gcc-snap -Os	pascal
110	gcc-4.6 -Os	peter
110	gcc-snap -Os	ryan
112	icc-13.1.0 -O1	bernd_2
115	gcc-snap -Os	tordek_2
116	gcc-snap -Os	davidl
116	gcc-snap -Os	davidl_2
122	clang-3.0 -Os	yolanpa
129	gcc-4.6 -Os	daniel_2
129	gcc-snap -Os	francois_2
131	gcc-snap -Os	jeffrey
136	gcc-4.4 -Os	yang_2
137	gcc-snap -Os	davide
138	gcc-4.4 -Os	david_2
139	gcc-snap -Os	thomas
145	gcc-snap -Os	matthewf_2
145	clang-snap -Os	chucky_2
148	gcc-4.4 -Os	mikael_2
159	gcc-snap -Os	greg
160	gcc-snap -Os	phil
165	gcc-snap -Os	mattias
172	icc-13.1.0 -O1	stefan
173	gcc-4.6 -Os	matthew_3
175	gcc-4.4 -Os	sidney
176	icc-13.1.0 -O1	patrick
182	gcc-snap -Os	bastian
185	gcc-4.7 -Os	andrew
203	clang-3.2 -Os	till
232	clang-3.1 -Os	olivier
251	gcc-snap -Os	kevin
275	gcc-snap -Os	renaud
276	gcc-4.6 -Os	ben
283	gcc-snap -Os	ethan_2
312	clang-snap -Os	mats

The libc entry doesn’t count, it’s just a call to strtol that has been wrapped with a bit of logic to make it conform to the str2long specification. Perhaps interestingly, no compiler consistently wins. Happily, -Os is the most popular compiler flag.

Performance was measured by taking the best out of 10 runs on an otherwise idle Core i7-2600. The metric is the average number of nanoseconds required to convert a string including whatever overhead is incurred by the test harness. As Robert Graham observed, the test cases are perhaps skewed too heavily towards invalid inputs, which would tend to penalize code optimized for the valid case.

Here are the results:

ns	compiler	submission
21.4	icc-13.1.0 -O2	bernd_2
21.6	icc-13.1.0 -Os	peter
22.4	gcc-4.7 -O2	yang_2
22.5	clang-3.0 -O1	ryan
22.6	gcc-4.7 -O2	till
23.8	icc-13.1.0 -O1	yolanpa
24.7	gcc-4.7 -O2	tordek_2
25.3	clang-3.2 -Os	andrew
25.6	gcc-4.4 -O3	davidl_2
25.7	gcc-4.7 -O2	pascal
25.7	gcc-4.5 -O3	renaud
26.0	gcc-4.7 -O2	davidl
26.2	icc-12.0.5 -O3	stefan
26.8	gcc-snap -O3	bastian
27.1	gcc-4.7 -O2	chucky_2
27.2	clang-3.1 -O3	olivier
27.5	gcc-snap -O3	mikael_2
27.8	gcc-4.7 -O2	kevin
27.9	clang-3.2 -O2	davide
28.3	gcc-snap -O3	david_2
28.5	clang-3.2 -O1	patrick
28.7	gcc-snap -O3	sidney
29.6	gcc-4.7 -O3	phil
32.1	gcc-snap -O3	greg
33.1	clang-3.1 -O2	ben
33.1	icc-12.0.5 -Os	jeffrey
33.8	icc-13.1.0 -O2	mattias
34.5	gcc-snap -O2	thomas
34.6	clang-snap -O3	matthewf_2
37.5	gcc-4.7 -O3	matthew_3
49.0	clang-3.1 -O1	francois_2
52.8	icc-12.0.5 -O2	libc
96.4	gcc-4.7 -Os	daniel_2
103.3	gcc-4.4 -O3	mats
128.3	gcc-4.4 -O2	ethan_2

Again, no particular compiler wins, and this time no particular optimization flag consistently wins either. The strtol code from libc performs surprisingly well considering that it is implementing a slightly more generic specification than str2long.

These results would perhaps be improved by repeating them on a modern ARM platform and some good ARM compilers, but I don’t have those available right now.

This concludes the objective portion of the contest results. The followup will be subjective and will look at the strategies and idioms that people used to implement the str2long specification without overflowing integers.

Thanks to everyone who submitted code! This has been fun.

March 19, 2013

regehr

Compilers, Computer Science, Software Correctness

21 responses to “str2long Contest Results Part 1”

Phil Miller says:

March 19, 2013 at 11:13 pm

It might be interesting to see performance numbers for the smallest emitted versions of each submission, and the code sizes of the fastest optimizations of each submission. The whole thing could be displayed on a single size-speed scatter-plot with the two sets of points.
regehr says:

March 19, 2013 at 11:16 pm

Phil why do I have the feeling I’ll soon be sucked into making nice Pareto curves here? 🙂
Toby says:

March 19, 2013 at 11:48 pm

“The nastiest test case that I used is one that allocates a 9GB string that is mostly leading zeros; this caused a few submissions that passed all other tests to fail, of course because they used a 32-bit index somewhere.”

I’m guessing that was me, but I see robert_2.c in your results table (no toby.c) which I assumed would have also suffered from this — he uses an ‘int’ index, I use an ‘unsigned int’ one. Am I missing something?
regehr says:

March 20, 2013 at 12:32 am

Hi Toby, of course you are right. Somehow I missed this despite having run the huge-string testcase. Not sure what happened. I’ve edited the post accordingly.

My guess is your own submission fell victim since you verified under 32-bit implementation-defined settings?
Toby says:

March 20, 2013 at 1:00 am

Correct. This bug hadn’t even occurred to me — clearly I can’t think past hard verification assumptions 🙂
David Eisenstat says:

March 20, 2013 at 7:52 am

Link goes to my solution made with full knowledge of the evaluation criteria.
David Eisenstat says:

March 20, 2013 at 8:38 am

(95 bytes using clang -Os; should be very fast)
regehr says:

March 20, 2013 at 9:11 am

Hi David, looks nice! I’ll check speed and size when I get home.
David Eisenstat says:

March 20, 2013 at 9:11 am

*sigh*, make that 107. (*shakes fist at negative 0*)
regehr says:

March 20, 2013 at 9:14 am

I get only 85 bytes of code from your latest using gcc-4.4.3 -Os.
regehr says:

March 20, 2013 at 9:17 am

David can you explain your code at the err label? Bizarrely, removing the code that is dead on two’s complement machines makes GCC generate larger code.
David Eisenstat says:

March 20, 2013 at 9:24 am

That is bizarre. I’m trying to avoid doing anything to %rax for error returns.
regehr says:

March 20, 2013 at 9:27 am

Nevermind, of course I deleted the wrong branch. But argh, there must be a cleaner way to say this…
David Eisenstat says:

March 20, 2013 at 10:04 am

New version that assumes two’s-complement. I think it’s clearer what’s going on.
regehr says:

March 20, 2013 at 10:35 am

One reason GCC generates smaller code is that at -Os it turns the multiplication by 10 into an imul rather than some leas.
David Eisenstat says:

March 20, 2013 at 10:42 am

I noticed that. 64-bit constants also eat space. This version (str2long*3*, also linked in my previous post) is at 84 bytes on gcc-4.7 -Os; when I write the conservative overflow test less efficiently but without the constant, that drops to 78.
Yang Chen says:

March 20, 2013 at 1:01 pm

Hmm, I might have removed asserts from my submission and this may reduce the code size of my submission a bit 🙂
regehr says:

March 20, 2013 at 1:58 pm

Yang I forgot to say that the results here are with NDEBUG defined, so assertions have all gone away.
Yang Chen says:

March 20, 2013 at 2:20 pm

I see. Thanks!
David Eisenstat says:

March 20, 2013 at 4:54 pm

> But argh, there must be a cleaner way to say thisâ€¦

I’m in the habit of writing code that doesn’t depend on implementation-defined behavior. You’re right that for this contest I could just do the unsigned -> signed cast.
David Grayson says:

March 20, 2013 at 5:19 pm

Yay, thanks for posting some results! My entry is david_2.c, and I’m happy to see that I am around the median of the group.