Volatile Structs Are Broken

For at least a year we’ve taken a break from reporting volatile bugs to C compiler developers. There were various reasons: the good compilers were becoming pretty much correct, we faced developer apathy about volatile incorrectness, our volatile testing tool had some weird problems, and basically I just got bored with it. Today my PhD student Yang Chen got a new and improved volatile testing tool working, with the goal of testing the volatile correctness of the new CompCert 1.8 for x86. Instead, he immediately found some distressing bugs in other compilers. As a side note Pin, the basis for Yang’s new tool, is pretty damn cool.

Given this C code:

struct S1 {  
  int f0;
};

volatile struct S1 g_18;

void func_1 (void) {
  g_18 = g_18;
}

All recent versions of GCC for x86 (I tried 3.0.0, 3.1.0, 3.2.0, 3.3.0, 3.4.0, 4.0.0, 4.1.0, 4.2.0, 4.3.0, 4.4.0, 4.5.0, and today’s SVN head) and Clang for x86 (I tried 2.6, 2.7, and today’s SVN head) give equivalent wrong code:

regehr@john-home:~$ gcc-4.4 -fomit-frame-pointer -O -S -o - small.c
func_1:
    rep
    ret

The generated code must both read from and write to the struct. Aren’t there any embedded systems broken by this?  LLVM-GCC 2.1 through 2.7 and Intel CC generate correct code.

And here is the obligatory link to our 2008 paper Volatiles Are Miscompiled, and What to Do about It. But obviously we’re not doing enough. Bugzilla links are here and here. (Update: fixed in LLVM within three hours of being reported.  Cool!)

Conference Hijacking

The ACM Workshop on Programming Languages and Operating Systems (PLOS) is a small event where researchers interested in the intersection of these areas can share ideas and results. There have been five of these workshops and I’ve attended a few of them and presented a paper at one. It’s a fun event and a perfectly respectable one.

The other day I received this invitation:

It is with great pleasure that we announce the inauguration of the *Annual
International Conference on Programming Languages and Operating Systems
(PLOS). *PLOS 2011 will be held on 15-16 August 2011.

The Conference Chair is Professor the Hon. Dr. Stephen Martin, Former
Speaker Parliament of Australia, Former Deputy Vice Chancellor (Strategy and
Planning) Curtin University of Technology, Former Pro Vice Chancellor
International, Victoria University, Former President/CEO University of
Wollongong in Dubai.

In view of your personal achievements and contribution to the related
research areas, we kindly seek your interest in constituting the *Technical
Program Committee (TPC)* of the Conference. Should you consent to be on the
TPC, you will also be eligible for the role of *TPC Chair* or *
Editor-in-Chief* for the conference proceedings based on your profile and
interest.

With respect to your responsibility, you may review a few papers and the
number will depend on your schedule and area of research interest/expertise.
In the event that your schedule is tight, paper review will be arranged
through other committee members. All administrative and logistic functions
will be managed by conference secretariats.

We would appreciate your prompt response as we intend to launch the
conference website soon.

Should you require any assistance or clarification, please do not hesitate
to contact me at farhana@globalstf.org  or Dr Tony Luo at tony@globalstf.org.

A little later on the same day, I got a mail from my colleague Eric Eide saying that neither he nor any of the other organizers of the ACM PLOS Workshop knew anything about this new conference, which is being sponsored by the GSTF (The Global Science and Technology Forum) rather than the ACM.

Creating a new conference with the same name as an existing one, and recruiting people known to have participated in the original event to help run the new one, is either inept or shady. Carefully reading the TPC invitation, it becomes clear that things are not totally on the level. First we have this:

The Conference Chair is Professor the Hon. Dr. Stephen Martin, Former Speaker Parliament of Australia

Now I’m sure that Stephen Martin is a great man (the invitation fails to note that he also had a successful career as a rugby league referee and administrator), but is he really the person to be running a conference on programming languages and operating systems? The area of his PhD is unspecified but it’s pretty safe to assume the guy is not a practicing computer scientist.

The second suspicious bit is:

Should you consent to be on the TPC, you will also be eligible for the role of *TPC Chair* or *Editor-in-Chief* for the conference proceedings based on your profile and interest.

That is not how things are done. Rather, the TPC chair is chosen by some sort of steering committee, and then he or she forms a program committee. To form a TPC without a chair makes little sense unless you’re simply fishing for a conference and have nobody to lead it.

Strike 3 against the new PLOS event is that none of the three organizers of the 2009 ACM PLOS Workshop who I talked to was contacted about the new event. PLOS 2009 had a 4th organizer who I haven’t heard back from, but there’s no reason to suspect that he was contacted either. If the GSTF PLOS organizers were acting in good faith, they’d have talked to the previous organizers. (Note that Eric was also an organizer in 2007 and Olaf Spinczyk, one of the other organizers who was not contacted, has organized PLOS for a number of years in a row.)

Finally, this sentence is off-key:

In the event that your schedule is tight, paper review will be arranged through other committee members.

If TPC members are not being recruited to review papers, then what is the TPC’s purpose? A reasonable guess is that the TPC will be used for its name-dropping value.

To summarize the evidence:

  1. Previous attendees of ACM PLOS, who obviously are willing to pay the registration fee for an event called “PLOS,” were invited to participate in the new PLOS.
  2. Previous organizers of ACM PLOS, who might be expected to be miffed at the creation of a new conference with the same name, were not contacted.
  3. The new conference is not being organized in response to any known demand for a new conference, but rather targets an already-filled niche.
  4. The new conference has no technical leadership, but is instead fishing for leaders. Legitimate new conferences are usually given strong leaders during the first few years to ensure a good start.
  5. The primary purpose of the TPC is to make an impressive list of names on web pages and in emails, not to filter submissions.

Taken together, this doesn’t feel like cluelessness as much as it suggests an attempt to hijack PLOS. I’m not sure that “conference hijacking” has a generally accepted definition, but it would involve using name confusion and possibly other duplicitous methods to steal away a community of people who habitually attend an existing conference. Presumably it is hoped that robust attendance at the new PLOS would cause the total conference registration fees to exceed the costs of running the conference. I have no idea if this is an isolated incident, nor did some obvious web searches turn up much useful information about GSTF (Google has a total of only 741 hits for “global science and technology forum”). Comments from people with information pointing in either direction would be appreciated.

It’s not necessarily bad for people to try to make money running academic conferences. However, an organization that does this would have incentives that are not quite aligned with the best interests of the academic community. Conference hijacking would be an expected outcome of this incentive mismatch, as would other strategies to increase attendance, such as lowering standards for publication. A few years ago some MIT students hilariously pranked the World Multiconference on Systemics, Cybernetics and Informatics by submitting an automatically generated paper, which was accepted for presentation and inclusion in the proceedings. In contrast, the incentives of a non-profit professional society would be more closely aligned with the interests of its constituent researchers. If the ACM, for example, finds itself in a situation where the income from one of its conferences exceeds expenses, the surplus stays inside the academic community. It could go into a rainy-day fund for the conference or — as is common — subsidize travel and registration expenses for students attending the event.

Anyway, the PLOS vs. PLOS story has a satisfactory ending: someone at the ACM sent a polite letter to the PLOS organizers at the GSTF, who decided to instead call their event The Annual International Conference on Operating Systems and Programming Languages (OSPL).

Why Not Mix Signed and Unsigned Values in C/C++?

Most C/C++ programmers have been told to avoid mixing signed and unsigned values in expressions. However — at least in part because we usually follow this advice — many of us are not totally on top of the underlying issues. This program illustrates what can go wrong:

#include <stdio.h>
int main (void)
{
  long a = -1; 
  unsigned b = 1; 
  printf ("%d\n", a > b); 
  return 0;
}

Here’s what happens on an x86-64 Linux box:

[regehr@gamow ~]$ gcc compare.c -o compare
[regehr@gamow ~]$ ./compare
0
[regehr@gamow ~]$ gcc -m32 compare.c -o compare
[regehr@gamow ~]$ ./compare
1

In other words, the inequality is false on x64 and true on x86. If this doesn’t give you at least one brief moment of “WTF?” then you’re doing a lot better than I did the first time I saw something like this happen.

The underlying issue is a feature interaction. The first feature is C’s integer promotion strategy, which preserves values but often does not preserve signedness. Usually, the arguments to any arithmetic operator are promoted to signed int — or to a larger signed type, if necessary, to make the operands have the same size. However, if the type contains values not representable in the signed promoted type, the promoted type is instead unsigned. For example, unsigned char and unsigned short can both be promoted to int because all of their values can be represented in an int. On the other hand, unsigned int cannot be promoted to int because (assuming ints are 32 bits) values like 2147483648 are not representable.

The second feature is C’s method for choosing which version of an operator to use. Although the greater-than operator in C always looks like “>”, behind the scenes there are quite a few different operators: signed integer >, unsigned integer >, signed long >, unsigned long >, etc. If either operand to “>” is unsigned, then an unsigned comparison is used, otherwise the comparison is signed.

Now, to return to the example: on a 64-bit platform, b is promoted to signed long and the signed “>” is used. On a 32-bit platform, because “int” and “long” are the same size, b remains unsigned, forcing the unsigned “>” to be used. This explains the reversal of the sense of the comparison.

Sign problems can sneak into code in two additional ways. First, it’s easy to forget that constants are signed by default, even when they are used in a context that seems like it should be unsigned. Second, the result of a comparison operator is always an int: a signed type.

Once we’re aware of these issues, it’s not hard to think through a puzzle like the one above. On the other hand, it can be very difficult to debug this kind of problem in a large piece of software, especially since sign problems are probably not the first item on our list of suspected root causes for a bug.

When writing new software, it’s definitely prudent to turn on compiler warnings about sign problems. Unfortunately, GCC 4.4 doesn’t warn about the program above even when given the -Wall option. The -Wsign-compare option does give a warning, but only when generating 32-bit code. When generating 64-bit code, there’s no warning since b is promoted to a signed type before being exposed to the “>” operator. So if we’re primarily developing on the 64-bit platform, the problem may remain latent for a while.

Just to make things extra confusing, one time I tracked down a problem where the version of GCC that was released as part of Ubuntu Hardy for x86 miscompiled a function very similar to the one above. It took this program and compiled it to return 1:

int foo (void) {
  signed char a = 1;
  unsigned char b = -1;
  return a > b;
}

Here both values should be promoted to signed int and the comparison is then (1 > 255). Obviously, in this case the compiler was buggy. The base version of GCC, 4.2.2, does not have this bug. However, the Ubuntu people applied about 5 MB of patches to this compiler before packaging it up and somehow broke it. A few years ago I found similar bugs in CIL and in an early version of Clang. Apparently even compiler developers are not immune to signed/unsigned confusion.

Giving a Talk at the GCC Summit

Last month I proposed giving a paper presentation at the next GCC Developers’ Summit about our compiler bug finding work. Happily, it was accepted and in late October I’ll head up to Ottawa to tell them what we’ve been doing and what we plan to do.  Here’s the abstract:

About 60 wrong-code and crash bugs in GCC that we have reported have been fixed, including 21 that were categorized as P1. All of these bugs were found using differential random testing: generating a C program, compiling it using multiple compilers and/or optimization levels, and comparing the results. Although most of our tests have been run on x86 and x86-64 platforms, most of the bugs occurred in target-independent code.

This talk will start out showing a few examples of bugs we have found. Next, I’ll talk about the main challenges in using random testing to find compiler bugs: avoiding C’s undefined behaviors and creating reduced test cases that can actually be reported. Third, I’ll present some results about the observed rates of crash and wrong-code errors in GCC 4.x releases. Finally, I’ll discuss some future directions including testing more language features and more targets.

I’m excited to give this talk since our work only matters if the people on the receiving end of our bug reports care about them. I’ll post more details from the event, or soon after.

Fun With Shift Optimizations

It’s fun to see what a modern compiler can optimize and what it cannot. The other day, while working on a new piece about undefined behavior, I noticed some C compilers failing to optimize simple code based on shifts. Here are the functions:

int shift1 (int x, int y)
{
  if (x>=0) {
    return (x>>y) <= x;
  } else {
    return 1;
  }
}

int shift2 (unsigned x, unsigned y)
{
  return (x>>y) <= x;
}

int shift3 (int x, int y)
{
  if (x>=0) {
    return (x<<y) >= 0;
  } else {
    return 1;
  }
}

A C99 or C++0x compiler can optimize each of these functions to return the constant 1. Why is this? shift1() and shift2() are pretty obvious: a non-negative number, shifted right by any amount, cannot increase in value, because vacated bits are filled with zeros. shift3() is trickier: when talking about left-shift, the C99 and C++0x standards state that:

If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

E1 is the left operand and E2 is the right operand. This is from 6.5.7 of N1124 and 5.8 of N3126. What the standards are really saying is that when left-shifting a signed integer, the compiler is permitted to assume that you don’t shift any 1 bits into the sign bit position, or past it. Thus, left-shifting a non-negative integer always results in a non-negative integer.

None of these three functions were optimized to “return 1” by any of the most recent x86 or x64 versions of Clang, GCC, or Intel CC. Rather, the generated code does the obvious testing, branching, and shifting. What’s going on? Hard to say — maybe it’s just that these situations almost never occur in real code. Let me know if you have better ideas or if you know of a compiler that does perform these optimizations. I would expect that a solid peephole superoptimizer like the one I outlined here would discover the optimized form of these functions.

Update

This is motivated by Ben Titzer’s comment below.

Fixing y to be 1 does not appear to make it fundamentally easier to compile these functions. None of the compilers can exploit this extra information to compile any function to “return 1.”

Fixing x to be 1 does make it easier to compile these functions. ICC exploits this information to optimize shift1() and shift2() so they return a constant; gcc uses it to compile shift1() to return a constant; Clang does not figure out how to do this for any of the functions.

Notes

I used ICC 11.1. Clang and GCC were SVN head from 9/8/2010. All compilers were invoked something like:

cc -std=c99 -c -O2 shift.c

Typical generated code (this is from gcc on x64):

shift1:
  movl    $1, %eax
  testl   %edi, %edi
  js      .L2
  movl    %edi, %eax
  movl    %esi, %ecx
  sarl    %cl, %eax
  cmpl    %edi, %eax
  setle   %al
  movzbl  %al, %eax
.L2:
  rep
  ret

shift2:
  movl    %edi, %eax
  movl    %esi, %ecx
  shrl    %cl, %eax
  cmpl    %edi, %eax
  setbe   %al
  movzbl  %al, %eax
  ret

shift3:
  movl    $1, %eax
  testl   %edi, %edi
  js      .L7
  movl    %edi, %eax
  movl    %esi, %ecx
  sall    %cl, %eax
  notl    %eax
  shrl    $31, %eax
.L7:
  rep
  ret

Strange Utah

Pretty much anyone in the world who knows that Utah exists, knows that Utah is weird. Outsiders have vague and usually uninformed — but nevertheless strong — feelings about Utah. Residents have more concrete information. The proper reaction is not to deny, marginalize, or rationalize Utah’s weirdness. The proper reaction is to embrace it, because as the degree of weirdness becomes large — and it can — it also becomes awesomeness. Utah boasts some superlatively weird, and superlatively awesome, spots like Gilgal Garden, the Kennecott mine, the Boulder Airport & UFO Landing Site, the Morrison-Knudsen Tunnels, Crystal Peak, and the Mars Desert Research Station. This only scratches the surface, I promise you. Having spent ten years — incredibly, more than a quarter of my life — in this state, I have come to appreciate all of this. I hope to never move away.

How might a person come to grips with Utah? Brandon Griggs’ book Utah Curiosities is great. However, the stories of green jello it tells, the folksy attractions it lists, capture kind of a comfortable, surface-level weirdness. It does not challenge us. On the other hand, Trent Harris’s Mondo Utah describes a deeper and more revealing kind of strangeness. This is the guy, after all, who directed Plan 10 From Outer Space. Mondo Utah tells the story behind Attack of the Giant Brine Shrimp, it describes Oscar Wilde’s 1882 visit to Salt Lake City, it gives a decoder for the Deseret Alphabet. It does not get better than this, folks. In the book’s conclusion Harris gets to the heart of the matter:

For reasons unknown this state seems to nurture strangeness. Some might say it is Utah’s way of protecting itself. They might say that this strangeness, is in fact, Utah’s first line of defense against the rush to become California. For when Mr. and Mrs. Normal from Normal, Illinois, pick up their paper and read the latest news from the beehive state, and then they sit back and comfortably snicker, “Boy, those dumb dorks in Utah sure are weird” and then they cancel there [sic] plans to move to Sandy, we can truly say we’ve won a battle, if not the war.

Amen. I call this “the porcupine effect” and it is real.

IRB Humor

In math, a series of logical steps leads to a logically correct result. In life, not so much. To find examples of reasonable steps leading to crazy consequences, there’s no need to look further than the nearest bureaucracy. Let’s take an example from an Institutional Review Board (IRB) — a part of the University bureaucracy whose job it is to ensure that researchers like me conduct experiments without harming participants (or bystanders).

This researcher I know about was running some studies where the participants were given alcohol. To permit this experiment to proceed, the IRB stipulated that female participants had to first be given a pregnancy test, to avoid the possibility that a woman unaware of her pregnancy would expose the fetus to alcohol during the experiment. So far so good, but how would you like being on either side of a conversation that starts out like this?

I know you just signed up for this experiment for lab credits, and I hate to be the one to tell you this, but…

Another bit of low-grade IRB humor can be found at the University of Utah’s IRB web site, which specifically mentions which report to file if “incarceration of a participant” occurs. What kind of studies are people running?  And how can I get involved?

Static Analysis Fatigue

My student Peng and I have been submitting lots of bug reports to maintainers of open source software packages. These bugs were found using Peng’s integer undefined behavior detector. We’ve found problems in OpenSSL, BIND, Perl, Python, PHP, GMP, GCC, and many others.

As we reported these bugs, I noticed developers doing something funny: in many cases, their first reaction was something like:

Please stop bothering us with these stupid static analysis results!

They said this despite the fact that in the initial bug report, I usually pointed out that not only were these results from actual tests, but they were from the tests provided by the developers themselves!  This interaction with PHP’s main developer is a good example.

What can we learn from this?  I take away a few different messages:

  1. From the developer’s point of view, static analysis results can suck. As these tools become smarter, they consider more and longer code paths. In a very real sense, this makes their results more difficult to reason about. It should go without saying that bug reports that are hard to reason about are not very actionable.
  2. The developers of popular open-source software projects are suffering from static analysis fatigue: a syndrome brought on by seeing too many bug reports from too many random tools, too few of which are actionable. If I made a living or a career out of static analysis, I’d be seriously worried about this.
  3. Many developers are still in the dark about C’s integer undefined behaviors. This is a constant surprise to me given that GCC and other common compilers have evaluated “(x+1)>x” to “true” for quite a while now.
  4. The best bug reports, by far, are those that are accompanied by a failure-inducing input. I think the reason that we’ve been running into developer confusion is that we’re telling people that their own “make check” is a failure-inducing input, which people don’t want to hear since when they run it, everything looks fine. Some of these problems will go away when our undefined behavior checker makes it into an LLVM release. Lacking the checker people can still reproduce our results, but only by manually adding assertions.
  5. Developers are prideful about their projects. This is as it should be, unless it blinds them to useful bug reports.

Regarding static analysis fatigue, there can be no silver bullet. People developing these tools must take false-positive elimination very seriously (and certainly some of them have). My preference for the time being is to simply not work on static analysis for bug-finding, but rather to work on the testing side. It is more satisfying and productive in many ways.