GCC Summit Talk

I’m at the 2010 GCC Summit in Ottawa today. There’s a lot of interesting work going on; I’ll probably write a more detailed post later. In the meantime, the slides from my talk on finding compiler bugs are below.

GCC Summit 2010

View more presentations from regehr.

October 25, 2010

regehr

Computer Science, Software Correctness

10 responses to “GCC Summit Talk”

Ben L. Titzer says:

October 25, 2010 at 1:52 pm

I’m curious, which certified compiler did you find bugs in?
regehr says:

October 25, 2010 at 2:09 pm

Hi Ben- It was CompCert. We’ve put quite a bit of effort into testing it, I need to write up a post on the results sometime.
Michael Norrish says:

October 25, 2010 at 4:30 pm

Iâ€™d certainly be interested to hear about CompCert too.

What sort of reaction did you get from the GCC folk? Based on the fact that theyâ€™re fixing your bugs, I assume theyâ€™re not completely hostile 🙂
regehr says:

October 25, 2010 at 5:32 pm

Hi Michael- Yes, the reaction was not completely hostile :). People asked a ton of questions. It sounds like they appreciate the effort, and the fact that our testcases are usually small and well-formed.
Chucky Ellison says:

November 6, 2010 at 2:37 pm

This is really fascinating work. Is the tool, or at least a collection of generated test programs available for download yet? I looked around on your site, but I couldn’t find any information.

I’m working on a formal semantics for C, and since it’s executable I’d like to run your program and test cases against it. I’m also quite interested in any problems you found with CompCert.
regehr says:

November 6, 2010 at 2:54 pm

Hi Chucky- An old version of the tool is available here:

http://www.cs.utah.edu/~eeide/emsoft08/

It is really obsolete, though. After we get out from under the PLDI deadline we’ll cut a new release, I think.

In a day or two I’ll mail you a draft of our PLDI submission which contains some CompCert details…
regehr says:

November 6, 2010 at 2:56 pm

Also I’d love to hear what meaning your semantics applies to this program:

int g = 1;

int func()
{
g = 2;
return 3;
}

int main()
{
int *l = &g;
*l = func();
return 0;
}

I can never figure out if this one is defined or not. Partly the sequence point semantics in the C standard is confusing and partly my brain just rejects this kind of thing.
Chucky Ellison says:

November 6, 2010 at 5:40 pm

I ran it through my semantics and did a state-space search and only found one solution, which is a complete evaluation. So, at least according to my semantics, the program is not undefined. Please feel free to send me more examples. I can do basic model checking and state-space search over the definition, as well as simple (single run) interpretation.

Here’s how I see it, although I could easily be mistaken. The assignment in main is sequenced after the evaluation of its arguments (*l) and (func()). It doesn’t matter in what order these arguments get evaluated. Since the assignment is sequenced after everything else, its results are the ones that stick around (g gets 3 at the end). This part is probably straightforward.

Perhaps the confusion is whether or not the assignment to g inside func() violates the rule that an object can only be updated once between sequence points. This program is okay in that respect because function calls have sequence points before the call and after the return.

Is this what you were wondering about? Let me know if there is another subtlety I missed.
regehr says:

November 6, 2010 at 11:26 pm

Chucky- Sounds like a great tool!

What does it say for

a[a[0]] = 3;

when a[0] is 0 ?

This example is one of Michael Norrish’s favorites.

Regarding my code above, your interpretation sounds right. I think I just have trouble accepting that I use a programming language that fails to sequence the RHS of an assignment before the LHS, and it all goes to hell from there…
Chucky Ellison says:

November 7, 2010 at 3:17 pm

John,
Thanks for the kind words 🙂 It’s been my life for about a year now, and the work is only just now starting to be useful, so I appreciate your encouragement.

I did the same thing as before for the program:
int main(void){
int a[5];
a[0] = 0;
a[a[0]] = 3;
printf(“%d\n”, a[0]);
return 0;
}

And again I only get a single possible outcome, that 3 is printed. You’re right that the sequencing relation can often be confusing. I’m not entirely sure this result is correct. I /think/ it is. S6.5:2 of N1494 says “If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.” My understanding (and what the tool is using) is that the evaluation of a[0] must be sequenced before the assignment itself, so even though there is a read at a[0] before a write at a[0], everything is okay, the same way “x += 1” is okay.

The semantics does report undefined behavior in expressions like “(x=3)+(x=4)”, “x = x++”, or “x + (x = 5)”, although I realize these are simple examples. Do you have any nice subtle examples of undefined behavior?