-
Undefined Behavior != Unsafe Programming
Undefined behavior (UB) in C and C++ is a clear and present danger to developers, especially when they are writing code that will execute near a trust boundary. A less well-known kind of undefined behavior exists in the intermediate representation (IR) for most optimizing, ahead-of-time compilers. For example, LLVM IR has undef and poison in…
-
Detecting Strict Aliasing Violations in the Wild
Type-based alias analysis, where pointers to different types are assumed to point to distinct objects, gives compilers a simple and effective way to disambiguate memory references in order to generate better code. Unfortunately, C and C++ make it easy for programmers to violate the assumptions upon which type-based alias analysis is built. “Strict aliasing” refers…
-
Testing LLVM
[This piece is loosely a followup to this one.] Background Once a piece of software reaches a certain size, it is guaranteed to be loosely specified and not completely understood by any individual. It gets committed to many times per day by people who are only loosely aware of each others’ work. It has many…
-
Undefined Behavior: Not Just for Programming Languages
This is an oldie but goodie. Start with this premise: a = b Multiply both sides by a: a2 = ab Subtract b2 from both sides: a2 – b2 = ab – b2 Factor the left side: (a + b)(a – b) = ab – b2 Factor the right side: (a + b)(a – b)…
-
Principles for Undefined Behavior in Programming Language Design
I’ve had a post with this title on the back burner for years but I was never quite convinced that it would say anything I haven’t said before. Last night I watched Chandler Carruth’s talk about undefined behavior at CppCon 2016 and it is good material and he says it better than I think I…
-
Solutions to Integer Overflow
Humans are typically not very good at reasoning about integers with limited range, whereas computers fundamentally work with limited-range numbers. This impedance mismatch has been the source of a lot of bugs over the last 50 years. The solution comes in multiple parts. In most programming languages, the default integer type should be a bignum:…
-
Isolating a Free-Range Miscompilation
If we say that a compiler is buggy, we need to be able to back up that claim with reproducible, compelling, and understandable evidence. Usually, this evidence centers on a test case that triggers the buggy behavior; we’ll say something like “given this test case, compiler A produces an executable that prints 0 whereas compiler…
-
A Month of Invalid GCC Bug Reports, and How to Eliminate Some of Them
During July 2016 the GCC developers marked 38 bug reports as INVALID. Here’s the full list. They fall into these (subjective) categories: 8 bug reports stemmed from undefined behavior in the test case (71753, 71780, 71803, 71813, 71885, 71955, 71957, 71746) 1 bug report was complaining about UB exploitation in general (71892) 15 bug reports…
-
C-Reduce 2.5
In May we released C-Reduce 2.5 which builds against LLVM/Clang 3.8. New in this release: Improved reduction of non-preprocessed C/C++ code. C-Reduce now includes #included files that are below a certain size and also uses unifdef to remove #ifdef/#endif pairs. Specialization of #define directives is not implemented yet. Support for reducing multiple files at once.…
-
Pointer Overflow Checking
Most programming languages have a lot of restrictions on the kinds of pointers that programs can create. C and C++ are unusually permissive in this respect: pointers to arbitrary objects and subobjects, usually all the way down to bytes, can be constructed. Consequently, most address computations can be expressed either in terms of integer arithmetic…