This Is Not a Defect


In several previous blog entries I’ve mentioned that in some recent versions of C and C++, left-shifting a 1 bit into the high-order bit of a signed integer is an undefined behavior. In other words, if you have code that computes INT_MIN by evaluating 1<<31 (or 1<<(sizeof(int)*CHAR_BIT-1) if you want to be that way) or code that does a byte swap using a subexpression like c<<24, then your program most likely has no meaning according to the standards. And in fact, Clang's integer sanitizer confirms that most non-trivial codes, including several crypto libraries, are undefined according to this rule.

An obvious fix is to tighten up the standard a bit and specify that shifting a 1 into the sign bit has the expected effect, which is what all compilers that I am aware of already do. This is what the C++ standards committee is doing, though as far as I can tell the fix doesn't officially take effect until a TC, a "technical corrigendum," is issued -- and even that doesn't finalize the thing, but it seems near enough in practice.

Anyhow, today Nevin Liber pointed out that there's a bit of news here, which is that last month the C standards committee decided that this same issue is not a defect in C and that they'll reconsider it later on, which I guess is fine since compiler implementers are ignoring this particular undefined behavior, but it seems like a bit of a missed opportunity to (1) make the language slightly saner and (2) bring it into line with the existing practice. Also you might consider perusing the full set of defect reports if you want to be thankful that you did something other than attend a standards meeting last month.

,

10 responses to “This Is Not a Defect”

  1. Hi bcs, that’s crazy talk! There’s still at least one ones’ complement platform out there with a C compiler (or so I hear).

  2. “There are good reason I have to do A (namely B) and also good reason I have to do B (i.e. C) … Z on the other hand, was a small mistake.”

  3. Such a shift being undefined obviously originates in the non-requirement of two’s complement integers, and arguably this reasoning no longer applies. Even so, shifting a one into the sign bit is frequently an error, and allowing sanitisers to check for this is probably helpful. The obvious case of byte-reversing a signed value can (and should) be done safely using a macro or (inline) function. Almost all other cases are either very real bugs or benign but should have been using unsigned types to begin with.

  4. Hi Mans, compilers and other tools are still free to issue a warning if this behavior becomes defined, but they are no longer free to destroy code that executes 1<<31. The change, when it happens, will be a good one.

  5. If left-shifting a 1 into the sign bit becomes defined, should then signed integer overflow also become defined behaviour? I think that would be much more controversial since compilers actually exploit it for optimization. Even if it does become defined, it would still be dangerous as long as older compilers are around.

    If only left-shifting becomes defined, then a left shift is no longer exactly equivalent to a multiplication with a power of two. Possibly that bit of complication is worth it to avoid some undefined behaviour and compatibility with C++. I cannot say myself.

  6. John, when you write “it seems like a bit of a missed opportunity” you seem to ignore the way the C standardization committee works. There really is no identified defect in the C standard concerning left-shifting a 1 bit into the high-order bit of a signed integer: as far as we know the standard is reasonably clear, complete and consistent in defining that aspect of the language. The standard can be changed so as to define a different language (e.g., one where such left-shifting is not UB), but this cannot be done via the process of defect reports, which serves a different purpose.

    (Although I am a member of WG14, the C standardization committee, the views expressed above are solely mine and not necessarily those of WG14.)

  7. Hi Roberto, if the standards body wants to say that a clear and consistent definition of the wrong programming language is not a defect, of course that is their prerogative. But it seems like a somewhat narrow definition of “defect.”

  8. For these standards committees, “defect” is just another way of saying “not internally consistent”. Otherwise, any non-contradictory statement is a good statement.

    Everything else is just a “change”.

  9. Hi Mike, fair enough, though in C++ this same issue does seem to be considered a defect.

    Anyway, I’m probably going to poke fun about this kind of thing regardless of the terminology.