Do Expressive Programming Languages Always Have Undefined Behavior?


In the Hacker News comments on one of my previous posts about undefined behavior, someone said this:

AFAIK Gödel’s incompleteness theorems imply that _any_ language will have at least some undefined behaviour.

Let’s take a quick look at this statement, keeping in mind that incompleteness and undecidability can be remarkably tricky topics. Some years ago I read and enjoyed Gödel’s Theorem: An Incomplete Guide to Its Use and Abuse (and no doubt need to reread it).

First off, it is clear that there exist programming languages that are free of UB, such as one where the semantics of every program is to print “7”. Whatever it is that UB means (we have not formally defined it), it seems clear that the language that always prints “7” does not have it.

There are also useful languages that are obviously UB-free, such as an expression language that evaluates elementary functions over IEEE floats. These languages are particularly easy to reason about because they are not Turing-complete: all computations terminate and we simply have to ensure that they terminate with a defined result.

In contrast, the HN commenter may have intended to invoke Rice’s Theorem: “Any nontrivial property about the language recognized by a Turing machine is undecidable.” A consequence is that when f() is some arbitrary computation we cannot in general hope to decide whether a program like this invokes undefined behavior:

main() {
  f();
  return 1 / 0; // UB!
}

But this is a red herring. Rice’s Theorem only applies to non-trivial properties: “properties that apply neither to no programs nor all programs in the language.” To sidestep it, we only need to define a programming language where UB-freedom is a trivial property. This is done by ensuring that every operation that a program can perform is a total function: it is defined in all circumstances. Thus, programs in this language will either terminate in some defined state or else fail to terminate. This kind of extremely tight specification is not typically done for realistic programming languages because it is a lot of work, particularly if the language has open-ended interactions with other levels of the system, such as inline assembly. But the problem is only a practical one; there is no problem in principle. It is not too difficult to write an UB-free specification for a (Turing-complete) toy language or subset of a real language.

Now let’s return to the original HN comment: it was about the incompleteness theorems, not about the halting problem. I’m not sure what to do with that, as I don’t see that Gödel’s theorems have any direct bearing on undefined behavior in programming languages.

,

9 responses to “Do Expressive Programming Languages Always Have Undefined Behavior?”

  1. A real-world example: Gallina, the expression language in the Coq theorem prover, is expressive enough (though not Turing-complete) but does not have undefined behaviors.

  2. I believe Assembly/machine languages for older processors (Z80-era? maybe earlier?) usually had no “undefined behavior.” (Somewhere around 8086-era, you start getting–effectively–UB around executing undefined instructions, as those are not guaranteed to have any particular behavior in the future.)

    If I understand correctly, tis-interpreter seeks to make a C implementation with no “undefined behavior;” instead, the tis-interpreter language defines C-standard UB as “throw an error.”

  3. > Gallina, the expression language in the Coq theorem prover, is expressive enough

    Expressive enough *for which task* ?

  4. I suspect the implication of Gödel is that for any non-trivial language there are questions about programs in it where the only practicable way to answer them is to run the program (and hope it terminates).

    Where UB comes in is the cases were even running the program won’t answer the question because different conforming implementation can give different results (even ignoring implementation defined behavior).

  5. Gödel’s theorem (or theorems, really, but only one is ever mentioned in these contexts) is abused in discussions as much as the term “quantum” is in bad science fiction (or if you want to get more specific, the uncertainty principle). In fact, now that I think about it, doesn’t the uncertainty principle imply that, essentially, all of our observations must necessarily be incomplete, thereby connecting it to Gödel? (Please don’t answer that.)

  6. It’s worth noting that the standard definition of a Turing machine has no undefined behavior. The motion of the tape head and changes to the tape are given by an explicit universal function of the current state and tape contents.

  7. ISTM there’s considerable UB in “elementary functions over floating-point numbers”:
    * the precisions and under/overflow thresholds of the extended floating-point formats are not specified in the standard
    * ISTR that there is enough ambiguity in when rounding should be done in the presence of gradual underflow, that Intel interpreted the spec one way and Motorola the other back in the 8087/68881 days
    * my (slightly dim) recollection is that the result of NAN1 op NAN2 is not fully specified — if they are propagating NANs the result will be one of {NAN1,NAN2} but the spec doesn’t say which one
    * and if you mean elementary *trancendental* functions (cosine, logarithm, etc), then the IEEE specs are completely silent; there are lots of libraries but few rigorous specs

  8. What the incompleteness theorem tells you is that you can write programs that are correct, but that you can’t prove (in some given reasonable formal system) are correct. I assume here the program is not subject to finite resource constraints, like one the address space or dynamically allocated memory.

  9. > I believe Assembly/machine languages for older processors (Z80-era? maybe earlier?) usually had no “undefined behavior.”

    They didn’t have undefined opcode traps, but they had “undefined behavior” in that some opcodes were not defined in the reference documentation and their behavior was not necessarily consistent from one generation to the next (even if they were, officially, 100% binary compatible) or even from one chip to the next, because minor manufacturing defects which only affected undefined opcodes wouldn’t cause a chip to be binned, even if the chipmaker knew they existed.

    (What happened was that those chips didn’t have the transistors to spend on trying to detect undefined opcodes, so whatever the decode logic did with arbitrary bit patterns was allowed to happen. Sometimes it was useful, sometimes not, and some undefined opcodes were common enough that they got documented in unofficial publications.)