Udacity Visit

Yesterday I took a day trip to Palo Alto to visit Udacity where I’m getting ready to teach a course on software testing. The goal was to become familiar with the recording setup, hash out any infrastructure issues, and try to refine my course content–which right now is just a collection of rough notes. Originally I had wanted to teach a class mainly about random testing, a topic I feel very strongly about. However, as I started getting the material together it became clear that “regular” testing was going to eat up probably half of the course slots–there is simply a giant amount of material to cover. So far I have deliberately avoided looking at anyone else’s course material, but probably I’ll need to start looking around since there are almost certainly gaps in my material that want to be filled.

My motivation for working with Udacity is that I’ve been disenchanted with traditional lecture courses for some time and have been nosing around for alternatives. The Udacity format–an hour or so of lecture material per week, broken into bite-sized pieces punctuated by little quizzes–seems like a good alternative. Certainly it’s a good match for my own very short attention span, and I’ve been enjoying the “applied cryptography” and “programming a robotic car” lectures, having somehow completely dodged both crypto and AI back when I was in school.

One thing that I really like about Udacity’s setup is that the course content is released under the Creative Commons license. This means that, for example, in the future I’ll be able to teach a software testing class at Utah where students watch the Udacity lecture material online, and we spend class time doing something more productive than listening to me talk. This is an idea I’m pretty excited about and that I’ve toyed with for a couple of years.

On arriving at Udacity I chatted with Dave Evans for a little while and then spent maybe half an hour watching Steve Huffman record content for his Web Application Engineering course. Steve is one of the guys behind Reddit and Hipmunk. He answered a few questions I had and it was super useful watching him record–there’s a lot of editing that happens between recording and distribution of these lectures and it wasn’t at all intuitive to me what impact this has on delivery style. For example, it’s perfectly OK to stop talking in order to focus on drawing a diagram on the tablet. A cute trick: if you take a bathroom break, it’s helpful to draw a little icon or note on the tablet screen indicating you’re gone so that an editor can rapidly and easily identify and discard the resulting useless segment of video.

We also talked about how to run the backend part of my testing course. The details aren’t all worked out yet, but what I hope we are able to do is have students implement a specification and also write a collection of test cases; then, we’ll not only rank students’ solutions by their level of correctness, but we’ll also be able to rank students’ ability to create test cases that break other students’ code.

Finally, I spent about an hour in the recording room. This was humbling: I found it very hard to draw and talk at the same time–it is going to take some practice to get that right. Embarrassingly, during the 10 years I’ve been teaching I’ve never once used a chalkboard or whiteboard, so this style is new to me. The other part that will take some getting used to is interleaving lots of quizzes with the lecture content; yesterday I tried to create a quiz on the fly and the results were not good, clearly these will have to be prepped ahead of time.

Anyway, I’ll post more details and impressions as they arrive.

Does Portable Byte-Order Code Optimize?

When reading data whose byte-ordering may differ from the host computer’s, Rob Pike advocates writing portable code as opposed to using #ifdefs to optimize the case where the encoded data’s byte ordering matches the host. His arguments seem reasonable and it’s definitely a win if the compiler can transparently recognize the useless data extraction operations and optimize them away. Rob’s code is basically this:

int read_little_endian_int (unsigned char *data)
{
  int i = 
    (data[0]<<0) | (data[1]<<8) | 
    (data[2]<<16) | (data[3]<<24);
  return i;
}

On a little-endian host, this code can be simplified to just a 32-bit load. But do compilers actually perform this optimization? It turns out not. Clang (built today from trunk) gives this:

read_little_endian_int:
 movzbl (%rdi), %ecx
 movzbl 1(%rdi), %eax
 shll $8, %eax
 orl %ecx, %eax
 movzbl 2(%rdi), %ecx
 shll $16, %ecx
 orl %eax, %ecx
 movzbl 3(%rdi), %eax
 shll $24, %eax
 orl %ecx, %eax
 ret

GCC (built a few days ago) and Intel’s compiler (12.0.5) produce very similar code.

Does the sub-optimality of this code matter? Probably not if we’re reading from a slow device, but it does matter if we’re reading from RAM. My Core i7 920 (running in 64-bit mode) can copy a large block of data using 32-bit operations at about 7 GB/s, but reaches only about 2 GB/s when running the data through the code emitted by Clang. The results for GCC and Intel CC are similar.

But now the plot thickens slightly. If we wish to check that the function above is implemented and compiled correctly, we might write a program with this main function:

int main (void)
{
  int i = INT_MIN;
  while (1) {
    int i2 = read_little_endian_int ((unsigned char *)&i);
    if (i != i2) {
      printf ("oops %d != %d\n", i, i2);
    }
    if (i == INT_MAX) break;
    i++;
  }
  return 0;
}

GCC and Intel CC compile this in the obvious way, but Clang (at -O3) turns this main function into a nop:

 main:
 xorl %eax, %eax
 ret

Now why would Clang understand that read_little_endian_int is a nop when it is used in this context, but fail to emit the obvious code for the function itself? I’m not sure what to make of this. Maybe someone familiar with LLVM’s optimizers can help out here. The full code is here if anyone wants to play with it.

57 Small Programs that Crash Compilers

It’s not clear how many people enjoy looking at programs that make compilers crash — but this post is for them (and me). Our paper on producing reduced test cases for compiler bugs contained a large table of results for crash bugs. Below are all of C-Reduce’s reduced programs for those bugs.

Can we conclude anything just by looking at these? It’s hard to say… many of these C fragments are not obviously hard to compile — to see the problem we would need to know the details of the translation to a particular compiler’s intermediate representation.

In general, we don’t know of any way to make these programs much smaller. In other words, C-Reduce already implements most of the tricks we can think of. It will always be the case that an experienced compiler developer who understand a particular bug will be able to produce considerably better reduced test cases than C-Reduce can. Our goal, rather, is to create tests that a naive user cannot improve very much. So if you see interesting opportunities to improve these test cases, we’d love to hear about it. The current version of C-Reduce fails to implement optimizations such as constant folding, constant propagation, and loop peeling. We haven’t seen much need for these, though.

These are the verbatim tool output; there are definitely some formatting warts.

C1 : Crashes Clang 2.6 at -O0:

#pragma pack(1)
struct S1 {
  int f0;
  char f2
};
struct {
  struct S1 f0
    }
a[] = { 0 }
      ;

C2 : Crashes Clang 2.6 at -O2:

struct S0 {
  int f0:1;
  int f4
}
a;
void
fn1 () {
  struct S0 b[][1][1] = { 0 };
  b[0][0][0] = a;
}

C3 : Crashes Clang 2.6 at -O2:

unsigned short a;
int b;
char c;
short
fn1 () {
  return 1 / a;
}
void
fn2 () {
  b = fn1 ();
  char d = b;
  c = d % 3;
}

C4 : Crashes Clang 2.6 at -O3:

int a, b, c, d, e;
#pragma pack(1)
struct S0 {
  int f0:14;
  int f1:13;
  int f2:28;
  int f3:23;
  int f4:12
};
void fn1 (struct S0);
void
fn2 () {
  int f;
lbl_2311:
  ;
  struct S0 g = { 0, 0, 1 };
  fn1 (g);
  b && e;
  for (; c;) {
    if (d)
      goto lbl_2311;
    f = a && 1 ? 0 : 1;
    g.f4 = f;
  }
}

C5 : Crashes Clang 2.6 at -O2:

 int crc32_context, g_2 = 0, g_5;
int g_8;
int *g_39, *g_371;
int g_81;
int func_1_l_15 ;
static short safe_add_func_int16_t_s_s ( short si1, int si2 ) {
    return si1 > 67 ? si1 : si1 + si2;
  }
   
   
   
   
    
   
    static int func_1 (  ) {
    int l_462 = 0;
    g_2 = 0;
    for ( ;
  g_2 < 12;
  g_2 = safe_add_func_int16_t_s_s ( g_2, 5 ) )     {
       g_5 = 1;
       for ( ;
 g_5;
 ++g_5 ) 	{
 	  g_8 = 1;
 	  for ( ;
 g_8 >= 0;
 g_8 = g_8 - 1 ) 	    {
 	      func_1_l_15 = 1;
  	      for ( ;
 func_1_l_15;
 		    func_1_l_15 =   func_1_l_15  - 1  ) 		if ( g_8 ) 		  break;
 	    }
 	  g_371 = &l_462;
 	  int *l_128 = &g_81;
 	  *l_128 = *g_39;
 	}
 *g_371 =    0 != 0   ;
     }
    return 0;
  }
   int main (  ) {
    func_1 (  );
    crc32_context = g_2;
    crc32_context += g_5;
  }
  

C6 : Crashes Clang 2.6 at -O0:

#pragma pack(1)
struct S2 {
  int f1;
  short f4
};
struct S3 {
  struct S2 f1;
  int f3:14
};
struct {
  struct S3 f3
    }
a = { 0, 0, 0 };

C7 : Crashes Clang 2.6 at -O1:

int *a;
static int **b;
int c, d, e;
void
fn1 () {
  d = &b == c;
  for (;;) {
    int **f = &a;
    if (e) {
    } else
      b = f;
    if (**b)
      continue;
    **f;
  }
}

C8 : Crashes Clang 2.6 at -O1:

#pragma pack(1)
struct S0 {
  int f3;
  char f4
};
struct {
  struct S0 f6;
  int f8
}
a = { 0, 0, 0 };

C9 : Crashes Clang 2.6 at -O2:

struct S0 {
  int f0;
  int f1;
  short f3;
  int f7;
  int f8
}
b;
int a, c, d, e, f;
void
fn1 (struct S0 p1) {
  d++;
  c = p1.f8;
  e = 0;
  a = p1.f7;
}
void
fn2 () {
  e = 0;
  for (; e; e++) {
    if (d)
      for (;;) {
      }
    --f;
  }
  fn1 (b);
}

C10 : Crashes Clang 2.6 at -O1:

union U2 {
  int f0;
  unsigned short f2
}
b;
static int a = 1;
void
fn1 (int p1, unsigned short p2) {
}
int fn2 (union U2);
union U2 fn3 ();
static unsigned long long
fn5 () {
  fn1 (b.f2, b.f0);
  return 0;
}
static char
fn4 () {
  fn5 ();
  return 0;
}
int
main () {
  a || fn2 (fn3 (fn4 () ) );
}

C11 : Crashes Clang 2.7 at -O1:

int *a;
static int **b;
int c, d, e;
void
fn1 () {
  d = &b == c;
  for (;;) {
    int **f = &a;
    if (e) {
    } else
      b = f;
    if (**b)
      continue;
    **f;
  }
}

C12 : Crashes Clang 2.7 at -O0:

char a;
unsigned char b;
int c;
void
fn1 () {
  (b ^= c) != a;
}

C13 : Crashes Clang 2.7 at -O2:

int a, b;
void fn1 ();
void
fn2 (short p1) {
  short c;
  c = (65532 | 3) + p1;
  fn1 (c && 1);
  b = (0 == p1) * a;
}

C14 : Crashes GCC 3.2.0 at -O1:

void
fn1 () {
  struct S0 *a;
  struct S0 *b, *c = &a;
  struct S0 **d = &c;
  if (&b == &a) {
  }
}

C15 : Crashes GCC 3.2.0 at -O3:

volatile int a, b, c, i;
char d;
void
fn1 () {
  int e;
  {
    for (;; c++) {
      int f[50] = { };
      if (b) {
        {
          0;
          {
            {
              int g = a, h = d;
              e = h ? g : g / 0;
            }
          }
          a = e;
        }
      }
    }
  }
}
void
main () {
  i = 0 / 0;
  a;
}

C16 : Crashes GCC 3.2.0 at -O3:

int a, c;
volatile int b;
void
fn1 () {
  b;
  for (;;)
    break;
  int d = b, e = a;
  c = a ? d : d % 0;
}
void
fn2 () {
  if (0 % 0)
    b;
}

C17 : Crashes GCC 3.2.0 at -O2:

union U1 {
  int f0;
  char f1
};
void
fn1 (union U1 p1) {
  p1.f1 = 0;
  for (; p1.f1;) {
  }
}

C18 : Crashes GCC 3.2.0 at -O1:

int a, b;
void
fn1 () {
  b = 4294967290UL <= a | b;
}

C19 : Crashes GCC 3.2.0 at -O3:

int a, b, c;
int
fn1 (int p1, int p2) {
  return p1 - p2;
}
void
fn2 () {
  int d;
  int **e;
  int ***f = &e;
  d = a && b ? a : a % 0;
  if (fn1 (f == 0, 2) )
    c = ***f;
}

C20 : Crashes GCC 3.3.0 at -O3:

int a, b, d;
struct S0 {
  int f3
};
int *volatile c;
void fn1 (struct S0);
void
fn2 () {
  int e;
  struct S0 **f;
  struct S0 ***g = &f;
  (a && b && b ? 0 : b) > (&c && 0);
  e = 0 == g;
  d = e >> 1;
  for (;;)
    fn1 (***g);
}

C21 : Crashes GCC 3.4.0 at -O3:

int a, b;
struct U0 {
  char f0;
  int f2
};
void
fn1 () {
  struct U0 c;
  for (; c.f0 != 1; c.f0 = c.f0 + a)
    b -= 1;
}

C22 : Crashes GCC 3.4.0 at -O3:

int a, b, d, e;
struct S0 {
  int f3
};
int *c;
void fn1 (struct S0);
void
fn2 () {
  struct S0 **f;
  struct S0 ***g = &f;
  (a && b && b ? 0 : b) > (&c == d);
  e = 1 < (0 == g);
  for (;;)
    fn1 (***g);
}

C23 : Crashes GCC 4.0.0 at -O2:

int ***a;
int b;
int *c;
void
main () {
  if (&c == a)
    b = 0 == *a;
}

C24 : Crashes GCC 4.0.0 at -O2:

int a[][0];
int *const b = &a[0][1];
int
fn1 () {
  return *b;
}

C25 : Crashes GCC 4.0.0 at -O0:

int a, b;
unsigned char c;
void
fn1 () {
  (0 >= a & (0 || b) ) > c;
}

C26 : Crashes GCC 4.0.0 at -O1:

struct {
  int f9:1
}
a;
const int b[] = { 0 };
void fn1 ();
void
main () {
  for (;;) {
    a.f9 = b[0];
    fn1 ();
  }
}

C27 : Crashes GCC 4.0.0 at -O0:

int a, c;
unsigned char b;
void
fn1 () {
  b > (c > 0 & 0 < a);
}

C28 : Crashes GCC 4.0.0 at -O2:

int **a[][0];
static int ***const b = &a[0][1];
void fn1 ();
int
fn2 () {
  return ***b;
  fn1 ();
}
void
fn1 () {
  **b;
}

C29 : Crashes GCC 4.1.0 at -O1:

volatile int ***a;
int b;
int **c;
void
fn1 () {
  if (&c == a)
    b = 0 == *a;
}

C30 : Crashes GCC 4.1.0 at -O1:

struct {
  int f0;
  int f2
}
a;
int b;
void
fn1 () {
  a.f2 = 0;
  int *c[] = { 0, 0, 0, 0, &a.f0, 0, 0, 0, &a.f0 };
  b = *c[4];
}

C31 : Crashes GCC 4.1.0 at -O2:

int a, b;
unsigned c;
void
fn1 () {
  for (; c <= 0;)
    if (b < c)
      a = 1 && c;
}

C32 : Crashes GCC 4.1.0 at -O1:

unsigned a;
int b;
void
main () {
  unsigned c = 4294967295;
  int d = c;
  b = a <= d || a;
}

C33 : Crashes GCC 4.1.0 at -O1:

const volatile long a;
void
main () {
  printf ("%d\n", (int) a);
}

C34 : Crashes GCC 4.1.0 at -O3:

int a, b;
union U1 {
  int f0;
  int f1
};
void
fn1 () {
  union U1 c = { 1 };
  int d = 1;
  if ( (c.f1 & a ? c.f1 : 1 - a) ^ d) {
  } else
    b = 0;
}

C35 : Crashes GCC 4.2.0 at -O1:

volatile int ***a;
int b;
int **c;
void
fn1 () {
  if (&c == a)
    b = 0 == *a;
}

C36 : Crashes GCC 4.2.0 at -O1:

struct S2 {
  volatile int f5:1;
  int f6
};
static struct S2 a;
void
main () {
  printf ("%d\n", a.f5);
}

C37 : Crashes GCC 4.3.0 at -O1:

long long *a;
int b;
void
fn1 () {
  long long **c = &a;
  int d = 7;
lbl_2890: {
    long long **e = &a;
    b = (e == c) < d;
    d = 0;
    goto lbl_2890;
  }
}

C38 : Crashes GCC 4.3.0 at -O2:

struct S2 {
  volatile int f5:1;
  int f6
};
static struct S2 a;
void
main () {
  printf ("%d\n", a.f5);
}

C39 : Crashes GCC 4.3.0 at -O3:

int a;
short b;
void
fn1 () {
  int c[0];
  for (;;) {
    a = c[0];
    b = 0;
    for (; b < 7; b += 1)
      c[b] = 0;
  }
}

C40 : Crashes GCC 4.3.0 at -O1:

volatile int **a;
int *b;
void
fn1 () {
  if (a == &b)
    **a;
}

C41 : Crashes GCC 4.3.0 at -O3:

int a, b, c, d, e, f;
void
fn1 () {
  char g;
lbl_120:
  if (b || e >= 0 & d >= 0 || a)
    return;
  g = f < 0 ? 1 : f;
  d = g == 0 || (char) f == 0 && g == 1 ? 0 : 0 % 0;
  if (c)
    goto lbl_120;
}

C42 : Crashes Intel CC 12.0.5 at -O1:

struct U0 {
  int f0
}
a;
struct U0
fn1 () {
  return a;
}
void
main () {
  0 > a.f0;
  fn1 ();
}

C43 : Crashes Open64 4.2.4 at -O3:

int a;
int *b;
unsigned c;
void
fn1 () {
  for (; a; a--)
    if (*b) {
      c = 0;
      for (; c >= 5; c++) {
      }
    }
}

C44 : Crashes Open64 4.2.4 at -O3:

short a;
void
fn1 () {
  long b;
  b = 44067713550;
  a |= b;
}

C45 : Crashes Open64 4.2.4 at -O3:

volatile int a;
void
fn1 () {
  int b = 1;
  a || b--;
}

C46 : Crashes Open64 4.2.4 at -O2:

int a, b;
void fn1 ();
void fn2 ();
void
fn3 () {
  fn2 ();
  fn1 ();
}
void
fn2 () {
  if (1) {
  } else
    for (;; b++) {
      int c = 0;
      int *d = &a;
      int **e = &d;
      *e = &c;
      *d = 0;
      *d |= 0;
    }
}

C47 : Crashes Open64 4.2.4 at -O3:

struct S0 {
  int f1:1
};
int a, b;
void
fn1 () {
  for (; b;) {
    struct S0 c = { };
    if (1) {
      c = c;
      a = c.f1;
    }
  }
}

C48 : Crashes Open64 4.2.4 at -O3:

int a, b;
int
fn1 () {
  int *c = &b;
  a = 0;
  for (; a >= -26; --a) {
    unsigned d = 18446744073709551615;
    int *e = &b;
    *e &= d;
  }
  return *c;
}

C49 : Crashes Open64 4.2.4 at -O3:

static int a, c, d;
int b;
int *e;
void
fn1 () {
  for (; a; a += 1) {
    b = 0;
    for (; b > -16; --b)
      for (; c;) {
        int *f = &d;
        *f = 0;
      } *e = 0;
  }
}

C50 : Crashes Sun CC 5.11 at -xO4:

unsigned char a, d;
struct {
  int f2
}
b;
int c, e;
void
fn1 (p1) {
}
void
fn2 () {
  c = 0;
  for (; c <= 0;)
    e = b.f2;
  fn1 (0);
  b = b;
  d = -a;
}

C51 : Crashes Sun CC 5.11 at -fast:

int a, c;
int b[1];
void
fn1 () {
  short d;
  for (; a; a -= 1) {
    d = b = b;
    b[0] = 0;
  }
}

C52 : Crashes Sun CC 5.11 at -xO4:

int a, b, d;
short c;
int
fn1 (p1) {
  return a ? 0 : p1;
}
void
fn2 () {
  int e = 0;
  for (;;) {
    c = 0;
    d = fn1 (e ^ ~~c);
    d && b;
  }
}

C53 : Crashes Sun CC 5.11 at -fast:

long a;
int b, d;
int *c;
void
fn1 () {
  int *e;
  for (;; b--)
    for (; d;) {
      *c = 0;
      *c &= (&e != 1) / a;
    }
}

C54 : Crashes Sun CC 5.11 at -xO0:

#pragma pack(1)
struct {
  int f3:1;
  int f4:16
}
a = { 1, 0 };

C55 : Crashes Sun CC 5.11 at -xO3:

int a, c;
static int b = 1;
void fn1 ();
void
fn2 () {
  for (; a; a--) {
    c = 0;
    for (; c != 1;) {
      if (b)
        break;
      fn1 ();
    }
  }
}

C56 : Crashes Sun CC 5.11 at -xO4:

#pragma pack(1)
struct S0 {
  int f1;
  int f3:1
}
a;
void
fn1 (struct S0 p1) {
  p1.f3 = 0;
}
void
fn2 () {
  fn1 (a);
}

C57 : Crashes Sun CC 5.11 at -fast:

int a, c, d, e, f, g, h, i, j, k;
volatile int b;
int
fn1 () {
  for (; d; d = a) {
    int *l = &c;
    c = -3;
    for (; c > -23; --c)
      if (k) {
        if (*l)
          continue;
        return b;
      }
    for (; i; ++i) {
      j = 0;
      g = h;
      for (; f <= 1; f += 1) {
      }
    }
  }
  return e;
}

New Paper on Test-Case Reduction

For about the last four years my group has struggled with the problem of taking a large program that makes a compiler do something bad (crash, run forever, generate incorrect code) and turning it into a minimum-sized program that elicits the same bad behavior. Delta is a great tool but it does not entirely solve the problem. Anyway, the short version of the story is that we spent a ton of time on this problem and only made good headway during the last year or so.

Our various approaches and their results are written up in a paper that I’ll present at PLDI 2012 in Beijing in June. I’m pretty happy with it. The audience is narrow but one of the new tools, C-Reduce (we’ll announce a release here shortly), really nails the problem and produces output far smaller than any tool anyone else has produced.

It had been a long time (eight years, perhaps) since I took both the technical lead and writing lead on a paper and let me tell you, it was awful. I’m not saying I did all the work or anything, but I did a lot of it and it about killed me. Anyway, I have a fresh appreciation for implementation effort that students put into a systems paper. C-Reduce is a simple tool that has basically one internal interface, and somehow it took at least a dozen iterations before I arrived at a version of this interface that I’m happy with.