Sometimes Compilers are Cute


#include <stdint.h>

uint32_t foo1 (uint32_t x, int r)
{
  return (x << r) | (x >> (32 - r));
}

uint32_t foo2 (uint32_t x, int r)
{
  return (x >> r) | (x << (32 - r));
}

uint32_t foo3 (uint32_t x)
{
  return 
    (x << 24) | 
    ((x & 0x0000ff00) <<  8) |
    ((x & 0x00ff0000) >>  8) |
    (x >> 24);
}

uint32_t foo4 (uint32_t x)
{
  char *xp = (char *)&x;
  uint32_t y;
  char *yp = (char *)&y;
  yp[0] = xp[3];
  yp[1] = xp[2];
  yp[2] = xp[1];
  yp[3] = xp[0];
  return y;
}

regehr@john-home ~ $ clang -Os -S -o - foo.c
foo1:
	movb	%sil, %cl
	roll	%cl, %edi
	movl	%edi, %eax
	ret

foo2:
	movb	%sil, %cl
	rorl	%cl, %edi
	movl	%edi, %eax
	ret

foo3:
	bswapl	%edi
	movl	%edi, %eax
	ret

foo4:
	bswapl	%edi
	movl	%edi, %eax
	ret

regehr@john-home ~ $ gcc -Os -S -o - foo.c
foo1:
	movl	%edi, %eax
	movb	%sil, %cl
	roll	%cl, %eax
	ret

foo2:
	movl	%edi, %eax
	movb	%sil, %cl
	rorl	%cl, %eax
	ret

foo3:
	movl	%edi, %eax
	bswap	%eax
	ret

foo4:
	movl	%edi, -20(%rsp)
	movb	-17(%rsp), %al
	movb	%al, -4(%rsp)
	movb	-18(%rsp), %al
	movb	%al, -3(%rsp)
	movb	-19(%rsp), %al
	movb	%al, -2(%rsp)
	movb	-20(%rsp), %al
	movb	%al, -1(%rsp)
	movl	-4(%rsp), %eax
	ret

I’ve edited the asm to remove extraneous lines. I expect there’s probably a tweak I could use to get GCC to recognize the pointer-based endian swap but I didn’t find it.

,

7 responses to “Sometimes Compilers are Cute”

  1. Irritatingly, I haven’t found a way to get GCC to emit a rotate instruction and still follow all the rules of C. The x<>(32-r) idiom is neat and “works” whether the shift count is taken modulo 32 or not (as observed by Henry Warren). However, it’s still undefined for x=0 and 32, strictly speaking.

  2. Sorry about my botched expression above; it should have been x<<r|x>>(32-r). (John, if you have any power over your blog software whatsoever, is there some kind of “preview comment” feature that could be enabled, or a little notice about what the mark-up rules are?)

  3. Hi Mattias, in fact I’ve just been looking at some crypto libraries and several of them execute undefined behaviors by calling this kind of rotate code for arguments 0 or 32.

    I tried using a guard to just return x if r==0 or r==32 and of course this works (and both compilers still emit rotate instructions) but they also (un-necessarily, I think) generate a few instructions corresponding to the guards.

    Regarding comments, I don’t think WordPress has a preview feature, but I should check again since a new version of WP just came out. There is no doubt a plugin that supports this feature, I’ll see what I can find.

  4. Derek it would be fun to make a list of these idioms and also to see how robust the transformations are. I hadn’t known (or had forgotten) that compilers do this but some students in my course this fall noticed the bswap.

    Nathan, thanks for the link!

  5. John, Your rotate code counts as an idiom, the go-faster attempts on 3n+1 are just clueless. I’m assuming that gcc/llvm now do a good job of optimizing very local cluelessness. The interesting question is how non-local the clueless behavior has to be before it defeats current optimizers.

    Tweaking some of the longer algorithms in “Matters Computational” (http://www.jjj.de/fxt/fxtbook.pdf) is on my list of things to try one day.