An Executable Semantics For C Is Useful

The goal of a C/C++ compiler is to turn every sequence of ASCII characters into executable instructions. OK, not really — though it does seem that way sometimes. The real goal of a C/C++ compiler is to map every conforming input into executable instructions that correspond to a legal interpretation of that input. The qualifiers “conforming” and “legal interpretation” are very important. First, the compiler has extremely weak requirements about what it should do with non-conforming inputs, such as programs that contain undefined behaviors (array bounds violations, etc.). Second, all realistic C/C++ programs have a large number of possible interpretations, for example corresponding to different integer sizes, different orders of evaluation for function arguments, etc. The compiler chooses a convenient or efficient one, and the remaining interpretations are latent. They may emerge later on if the compiler options are changed, if the compiler is upgraded, or if a different compiler is used. The point is that the compiler has no obligation to tell us whether the input is conforming or not, nor how many possible interpretations it has.

Thus, while C/C++ compilers are very good at turning conforming programs into efficient executables, they are just about useless for other answering other kinds of questions:

Does the program ever execute undefined behaviors, causing it (in principle) to have no meaning and (in practice) to execute attack code or crash?
Does the program rely on unspecified behaviors, making it non-portable across compilers, compiler versions, and changes in compiler options?
Does the program rely on implementation-defined behaviors, affecting its portability to other compilers and platforms?
Why does the program behave in a certain way? In other words, what part of the standard forced that interpretation?

To answer these questions, a wide variety of static analyzers, model checkers, runtime verifiers, and related tools have been developed. These tools are great. However, even taken all together, they are incomplete: there exist plenty of bad (or interesting) program behaviors that few or none of them can find. For example:

Very few tools exist that can reliably detect uses of uninitialized storage.
Few, if any, tools can correctly diagnose problems resulting from C/C++’s unspecified order of evaluation of function arguments.
An lvalue must not be modified multiple times, or be both read and written, in between sequence points. I’m not aware of many tools that can correctly detect that evaluating this function results in undefined behavior when p1 and p2 are aliases:

int foo (int *p1, int *p2) {
  return (*p1)++ % (*p2)++;
}

There exists C code subtle enough that multiple independent compiler teams get it wrong. Miscompilations are, by definition, out of reach for source-level analysis tools.

The Missing Tool

The missing tool (or one of them, at any rate) is an executable semantics for C. An executable semantics is an extremely careful kind of interpreter where every action it takes directly corresponds to some part of the language standard. Moreover, an executable semantics can be designed to tell us whether the standard assigns any meaning at all to the program being interpreted. In other words, it can explicitly check for all (or at least most) undefined, unspecified, and implementation-defined behaviors. For example, when an executable semantics evaluates (*p1)++ % (*p2)++, it won’t assign a meaning to the expression until it has checked that:

both pointers are valid
neither addition overflows (if the promoted types are signed)
p1 and p2 are not aliases
*p2 is not 0
either *p1 is not INT_MIN or *p2 is not -1

Moreover, the tool should make explicit all of the implicit casts that are part of the “usual arithmetic conversions.” And it needs to do about 500 other things that we usually don’t think about when dealing with C code.

Who Needs an Executable Semantics?

Regular programmers won’t need it very often, but they will occasionally find it useful for settling the kinds of annoying arguments that happen when people don’t know how to read the all-too-ambiguous English text in the standard. Of course, the executable semantics can only settle arguments if we agree that it has captured the sense of the standard. Better yet, we would treat the executable semantics as definitive and the document as a derivative work.

Compiler developers need an executable semantics. It would provide a simple, automated filter to apply to programs that purportedly trigger compiler bugs. A web page at Keil states that “Fewer than 1% of the bug reports we receive are actually bugs.” An executable semantics would rapidly find code fragments that contain undefined or unspecified behaviors — these are a common source of bogus bug reports. Currently, compiler developers do this checking by hand. The GCC bug database contains 4966 bug reports that have been marked as INVALID. Not all of these could be automatically detected, but some of them certainly could be.

People developing safety-critical software may get some benefit from an executable semantics. Consider CompCert, a verified compiler that provably preserves the meaning of C code when translating it into assembly. CompCert’s guarantee, however, is conditional on the C code containing no undefined behaviors. How are we supposed to verify the absence of undefined behaviors when existing tools don’t reliably check for initialization and multiple updates to lvalues? Moreover, CompCert is free to choose any legitimate interpretation of a C program that relies on unspecified behaviors, and it does not need to tell us which one it has chosen. We need to verify up-front that (under some set of implementation-defined behaviors) our safety-critical C program has a single interpretation.

My students and I need an executable semantics, because we are constantly trying to figure out whether random C functions are well-defined or not. This is surprisingly hard. We also need a reliable, automated way to detect undefined behavior because this enables automated test case reduction.

An Executable Semantics for C Exists

I spent a few years lamenting the non-existence of an executable C semantics, but no longer: as of recently, the tool exists. It was created by Chucky Ellison, a PhD student at the University of Illinois working under the supervision of Grigore Rosu. They have written a TR about it and also the tool can be downloaded. Hopefully Chucky does not mind if I provide this link — the tool is very much a research prototype (mainly, it is not very fast). But it works:

regehr@home:~/svn/code$ cat lval.c
int foo (int *p1, int *p2) {
  return (*p1)++ % (*p2)++;
}

int main (void) {
  int a = 1;
  return foo (&a, &a);
}
regehr@home:~/svn/code$ kcc lval.c
regehr@home:~/svn/code$ ./a.out
=============================================================
ERROR! KCC encountered an error while executing this program.
=============================================================
Error: 00003
Description: Unsequenced side effect on scalar object with value computation of same object.
=============================================================
File: /mnt/z/svn/code/lval.c
Function: foo
Line: 2
============================================================

As I mentioned earlier, very few tools for analyzing C code find this error. Chucky’s tool can also perform a state space exploration to find order of evaluation problems and problems in concurrent C codes. Finally, it can run in profile mode. Unlike a regular profiler, this one profiles the rules from the C semantics that fire when the program is interpreted. This is really cool and we plan to use it to figure out what parts of the C standard are not exercised by Csmith.

Chucky’s tool is already an integral part of one of our test case reducers. This reducer takes as input a huge, ugly, bug-triggering C program generated by Csmith. It then uses Delta debugging to output a much smaller bug-triggering program that (ideally) can be included in a compiler bug report without further manual simplification. Before Chucky’s tool arrived, we had spent several years trying to deal with the fact that Delta always introduces undefined behavior. We now seem to have a bulletproof solution to that problem.

The benefits of executable semantics have long been recognized in the PL community. The new thing here is a tool that actually works, for the C language. Hopefully, as Chucky’s tool matures people will find more uses for it, and perhaps it can even evolve into a sort of de facto litmus test for ascertaining the meaning — or lack thereof — of difficult C programs.

April 26, 2011

regehr

Compilers, Computer Science, Software Correctness

11 responses to “An Executable Semantics For C Is Useful”

Pascal Cuoq says:

April 27, 2011 at 12:15 am

* both pointers are non-NULL
* neither addition overflows (if the promoted types are signed)
* p1 and p2 are not aliases
* *p2 is not 0
* either *p1 is not INT_MIN or *p2 is not -1

Since the standard allows to compute t+n for an array t of size n, which it then forbids to dereference, I would argue that the first condition should be “both pointers are valid”. But this is only one way a pointer can be invalid. Another is to point to a local variable that has gone out of scope. In Frama-C’s value analysis, this check used to require in the first implementation a pass on the entire state, to see which variable was pointing to variables that had gone out of scope. The current implementation spends some time tracking who contains addresses of locals to save time when a block is exited. Do you know how KCC does this?

Note that Frama-C’s value analysis does check all the conditions above and emit an alarm each time one of them seems not to be verified (it may stop propagation at the first one, like a KCC-compiled program does, but reserves the right not to do so). For historical reasons, you need to activate signed arithmetic overflow detection with -val-signed-overflow-alarms in order to get a warning for the last condition (there is no option to activate the detection only for division). Option -unspecified-access must be set to detect interfering side-effects caused by pre/post-decrement operators. Side effects of function calls are only taken into account in a separate, undistributed plug-in.
Chucky Ellison says:

April 27, 2011 at 7:34 am

Hi Pascal. Here is an example of your first question.

int main(void){
int* p;
{
int x;
p = &x;
}
*p;
}

[celliso2@fsl3 c-semantics]$ kcc cuoq.c
[celliso2@fsl3 c-semantics]$ ./a.out
=============================================================
ERROR! KCC encountered an error while executing this program.
=============================================================
Error: 00007
Description: Referring to an object outside of its lifetime.
=============================================================
File: /home/grosu/celliso2/c-semantics/cuoq.c
Function: main
Line: 7
=============================================================

The way that this is handled is pretty straightforward. The lifetime of variables is enforced in the semantics by deleting (marking the block dead) that memory as soon as its life is over. Then, if in the future the program tries to dereference this memory, the semantics sees that the memory no longer exists.

Actually, the standard is even more strict than that. It says “The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime” (n1548 6.2.4:2). This means it’s possible just reading the pointer (p; instead of *p; above) is undefined. My semantics can do this, but I have it turned off because it’s annoying. It should be an option available to the user in the future. The “or just past” clause is because you are actually allowed to create a pointer that points to t+n+1 for array t an length 1, in order to make loops over arrays easy to write. Of course, you still can’t dereference t+n+1. Incidentally, you are not allowed to create a pointer t-1, even though you see decrementing loops over arrays a lot.

Finally, the memory itself is being handled symbolically, so that this program emits an error:
int main(void){
int x, y;
return &x < &y;
}
and this program succeeds:
int main(void){
int a[3];
return &a[0] < &a[2];
}
As John said, we don't want to allow users to write code that relies on unspecified behavior.
Chucky Ellison says:

April 27, 2011 at 8:48 am

Ah, obviously I can’t count. You’re right, it’s just t+n for array t of length n. E.g, for int a[3]; a[3] is a valid pointer, but not dereferencable. My point was that it’s one past what you’re allowed to dereference, although I messed up the arithmetic. It’s a good thing we have computers to do this for us!
regehr says:

April 27, 2011 at 11:05 am

Pascal, of course you’re right about the valid pointers vs. non-null.

Is it possible to invoke Frama-C in interpreter mode? One of the things that’s really nice about Chucky’s tool is its complete lack of false positives.
Pascal Cuoq says:

April 28, 2011 at 1:21 am

Hello John,

we have just discussed this a bit with Chucky by e-mail. I do not think that you will get either interesting new features or speed from doing so, but Csmith-generated programs can be “interpreted” by the value analysis with options -no-results -slevel 999999 -val. I have been doing this in the background for a while. As you say, there aren’t supposed to be false positives in this mode, so this is good for looking for “loss of precision” bugs, whereas the first technique we had discussed is only good for finding correctness bugs.

In the development version you have access to, there are no known issues that would cause a completely unrolled analysis of a Csmith-generated program not to remain precise until the end, with both -machdep x86_64 and the default -machdep x86_32. The only alarms that very rarely pop up are related to addresses of locals being used out of scope, considering that the value analysis considers “passing as an argument to a function” as “using” (which it wouldn’t have to do, but when the called function’s code is unavailable, you can’t hope to postpone the alarm until the dangling pointer is actually used because the actual use may be in the missing code, and then for consistency it was easier to warn when a dangling pointer is passed to a function even when that function’s code is available). Do you want me to send in a bug report against the official Csmith 2.0.0 version?
Derek Jones says:

April 28, 2011 at 7:29 am

John,

The Model Implementation C Checker was written over 20 years ago and detects all undefined, implementation defined and unspecified behaviors. It is essentially a compiler, linker and interpreter. Some description of what it does here, http://www.knosof.co.uk/whoguard.html. There was never a commercially significant market for the runtime checks (it did all the uninitialised variable and pointer must point at an object {it even checked that two compared pointers pointed at the same object}). The front end morphed in OSPC, http://www.knosof.co.uk/sales.html, and also got licensed to a static analysis vendor.
Chucky Ellison says:

April 28, 2011 at 8:01 am

Derek,

Is it possible to obtain a copy of the Model Implementation C Checker, or any documentation, TR, whitepaper, or paper about it? I’m interested in knowing how that tool works. I’ve been unable to find essentially anything except for your link you posted above.

From your description, it sounds like my tool is entirely redundant!
regehr says:

April 28, 2011 at 10:36 am

Derek this is very interesting. I echo Chucky’s request for more information.

I’m curious why you say there was never a market for the runtime checks: Purify and related tools have enjoyed a reasonable degree of success, right? Valgrind, while not a commercial product, is heavily used.
regehr says:

April 28, 2011 at 10:40 am

Pascal, thanks for the details.

Yes, if you have found a bug in Csmith 2.0.0 we’d certainly like to hear about it.

Of course, in the meantime, Xuejun is busily hacking new features into Csmith. We’re working on getting Csmith into github, at which point you’ll be able to run the latest version, which of course is where we’re most interested in bug reports.
Derek Jones says:

April 28, 2011 at 10:55 am

Chucky,

Your tool is not redundant there are all sorts of things that could be done with it; also it is C99 while the Model implementation is C90 and you could look for inconsistencies in the specification, ie bugs in the C Standard).

I was rather discouraged by the response to the Model Implementation and never wrote anything up for publication. Very few people understand the benefits of runtime checking, even Boundschecker, http://sourceforge.net/projects/boundschecking/, whose executables run an order of magnitude faster than the Model Implementation has few users.

The Model Implementation was written by three guys, it used the compiler/linker/interpreter approach, who tried real hard to follow the exact wording of the C Standard. The runtime checking worked at the byte level and so if two bit-fields occupied the same byte and initialization of one of them would cause the other to be treated as initialized, also it would not detect unspecified behavior caused by pointer aliasing (considered to have too high an overhead for the likely benefit; we were aiming for real world use).

A good test of the quality of a ‘checking’ implementation is how many bugs it finds in the language standard. The US members of WG14 got rather tired of lists, http://www.open-std.org/jtc1/sc22/wg14/docs/rr/dr_017.html, of what they considered nit picking problems in the C Standard and some suggested improvements have only just made it into the latest standard, http://shape-of-code.coding-guidelines.com/2011/03/17/a-change-of-guard-in-the-c-standards-world/ You ought to try and find at least one ambiguity in the C Standard.

I was interested to see that your tool will print out a profile of the appropriate C semantics as it processes code. We cross referenced if statements in the Model Implementation to requirements in the C Standard as method of checking that all requirements were covered and somebody suggested that there ought to be an option to print these out as code was processed. One possible use for this profiling ability is to be able to warn developers when they are making use of obscure corners of the language; obscure might be measured in terms of frequency of use, which as this experiment suggests http://www.knosof.co.uk/cbook/accu06.html, is a possible cause of developer misunderstanding.

Have yo tried some of the more obscure examples from my C book?

You might be interested in this paper on forms of language specification:
http://aitc.aitcnet.org/isai/DocLog/000-099/060-thru-079/22-OWGV-N-0060/n0060.pdf
Derek Jones says:

April 28, 2011 at 11:02 am

John,

You might like to look at Coccinelle as a very useful tool for matching fragments of C source:
http://shape-of-code.coding-guidelines.com/2009/01/08/semantic-pattern-matching-coccinelle/
http://shape-of-code.coding-guidelines.com/2009/08/31/using-coccinelle-to-match-if-sequences/

Grrr, you are still referring to CompCert as a verified compiler. I saw somebody repeat your claim on a mailing list the other day. CompCert is a great achievement and including Chucky’s tool as a front end it would be a step closer to it becoming a verified compiler.