#include <stdint.h> uint32_t foo1 (uint32_t x, int r) { return (x << r) | (x >> (32 - r)); } uint32_t foo2 (uint32_t x, int r) { return (x >> r) | (x << (32 - r)); } uint32_t foo3 (uint32_t x) { return (x << 24) | ((x & 0x0000ff00) << 8) | ((x & 0x00ff0000) >> 8) | (x >> 24); } uint32_t foo4 (uint32_t x) { char *xp = (char *)&x; uint32_t y; char *yp = (char *)&y; yp[0] = xp[3]; yp[1] = xp[2]; yp[2] = xp[1]; yp[3] = xp[0]; return y; }
regehr@john-home ~ $ clang -Os -S -o - foo.c foo1: movb %sil, %cl roll %cl, %edi movl %edi, %eax ret foo2: movb %sil, %cl rorl %cl, %edi movl %edi, %eax ret foo3: bswapl %edi movl %edi, %eax ret foo4: bswapl %edi movl %edi, %eax ret regehr@john-home ~ $ gcc -Os -S -o - foo.c foo1: movl %edi, %eax movb %sil, %cl roll %cl, %eax ret foo2: movl %edi, %eax movb %sil, %cl rorl %cl, %eax ret foo3: movl %edi, %eax bswap %eax ret foo4: movl %edi, -20(%rsp) movb -17(%rsp), %al movb %al, -4(%rsp) movb -18(%rsp), %al movb %al, -3(%rsp) movb -19(%rsp), %al movb %al, -2(%rsp) movb -20(%rsp), %al movb %al, -1(%rsp) movl -4(%rsp), %eax ret
I’ve edited the asm to remove extraneous lines. I expect there’s probably a tweak I could use to get GCC to recognize the pointer-based endian swap but I didn’t find it.
7 responses to “Sometimes Compilers are Cute”
Irritatingly, I haven’t found a way to get GCC to emit a rotate instruction and still follow all the rules of C. The x<>(32-r) idiom is neat and “works” whether the shift count is taken modulo 32 or not (as observed by Henry Warren). However, it’s still undefined for x=0 and 32, strictly speaking.
Sorry about my botched expression above; it should have been x<<r|x>>(32-r). (John, if you have any power over your blog software whatsoever, is there some kind of “preview comment” feature that could be enabled, or a little notice about what the mark-up rules are?)
Hi Mattias, in fact I’ve just been looking at some crypto libraries and several of them execute undefined behaviors by calling this kind of rotate code for arguments 0 or 32.
I tried using a guard to just return x if r==0 or r==32 and of course this works (and both compilers still emit rotate instructions) but they also (un-necessarily, I think) generate a few instructions corresponding to the guards.
Regarding comments, I don’t think WordPress has a preview feature, but I should check again since a new version of WP just came out. There is no doubt a plugin that supports this feature, I’ll see what I can find.
Another interesting case to add to my growing collection of ‘things a modern optimizer should do’ (http://shape-of-code.coding-guidelines.com/2011/07/24/compiler-benchmarking-for-the-21st-century/). Putting the left/right rotate in the two arms of the same if-statement makes things even more interesting
GCC not recognizing the pointer-based swap is http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42587
Derek it would be fun to make a list of these idioms and also to see how robust the transformations are. I hadn’t known (or had forgotten) that compilers do this but some students in my course this fall noticed the bswap.
Nathan, thanks for the link!
John, Your rotate code counts as an idiom, the go-faster attempts on 3n+1 are just clueless. I’m assuming that gcc/llvm now do a good job of optimizing very local cluelessness. The interesting question is how non-local the clueless behavior has to be before it defeats current optimizers.
Tweaking some of the longer algorithms in “Matters Computational” (http://www.jjj.de/fxt/fxtbook.pdf) is on my list of things to try one day.