The C language's rules for integer operations have some quirks that can make even small programs behave in confusing ways. This post is a review of these rules in the form of a quiz containing 20 questions. I recommend going through the questions in order. If you are a beginning C programmer, you should consult a C book as you go through these questions since there are a lot of little things (such as what "1U" means) that I have not bothered to explain. If you are a serious C programmer, I expect you'll do well -- this quiz is not intended to be extremely difficult. You should assume C99. Also assume that x86 or x86-64 is the target. In other words, please answer each question in the context of a C compiler whose implementation-defined characteristics include two's complement signed integers, 8-bit chars, 16-bit shorts, and 32-bit ints. The long type is 32 bits on x86, but 64 bits on x86-64 (this is LP64, for those who care about such things). Summary: Assume implementation-defined behaviors that Clang / GCC / Intel CC would make when targeting LP64. Make no assumptions about undefined behaviors.
You scored %%SCORE%% out of %%TOTAL%%. Note: Sometimes scores are reported incorrectly -- sorry about that. I think it's a bug in the quiz plugin for WP that I'm using.
I hope you found this quiz to be useful and/or entertaining, and please write a comment or mail me if you find a mistake. As I said in the introduction, it was intended to be easy for experienced C programmers. In reality, integer bugs are hard to avoid not so much because the individual issues are extremely complicated, but rather because integer operations are everywhere and their corner case bugs get mixed in with algorithmic difficulties and other programming problems. I have another integer quiz in the works that will be more difficult.
Your answers are highlighted below.
{ 77 } Comments
Thank you. I learned quite a bit in this quiz.
Nice quiz!
I think in Q4 you should mention that the x64 model is LP64 (Linux), not LLP64 (Windows).
Err nevermind, I missed the description at the start of the post…
In Question 18, wouldn’t the expression be implementation-defined if the value of x does not fit in a short?
Hi gergo, in the quiz intro I specify fixed values for integer sizes. This quiz isn’t about portable C, which is a far more difficult topic.
Hi igorsk, I’ll add a bit more clarification!
fyi, Q7 lists INT_MAX twice. probably one should be INT_MIN.
The answer to Q9 seems wrong to me:
C99 6.5.7p4:
If E1 has a signed type and nonnegative value, and E1 * 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
I’d say that if E1 is negative the behavior is undefined also for E2 equal to 0.
Your question 9 is:
> Assume x has type int. Is the expression x<<0…
The answer you give as correct is “always defined” and the rationale you give is “any value of type int can be safely shifted by zero bit positions.”
I beg to differ. C99's 6.5.7:4 is:
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. [...] If E1 has a signed type and nonnegative value, and E1 × 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
(-1) << 0 is undefined according to the above paragraph.
Thanks Pascal and abramo, I agree and have fixed the answer to #9.
Regarding question 20, this compiler bug has been reverse-engineered into the standard, which says as of C11:
“If the quotient a/b is representable, the expression (a/b)*b + a%b shall equal a; otherwise, the behavior of both a/b and a%b is undefined.”
U have just made another programmer less ignorant in the world. Congratulations.
That was an excellent quiz. Bookmarked for directing people to in the future.
Looking at question 4; I don’t believe the spec guarantees long having greater length than int on 64 bit architectures. Only that it is at least as long. I certainly wouldn’t count on it being longer. Though if I’m wrong about this I’d like to know.
..By the way. great quiz. Learned a few things
Hi Jesse, I guess I need to state it more prominently but the quiz intro does specify LP64!
I got question 5 wrong (clicked on the right answer last), and partial credit on some other questions, but the Get Results button says I scored “20 out of 20″. That doesn’t look like an accurate result.
Great quiz, I learned a lot
In Question 5, “each C implementation is permitted to make its own choice” seems to imply that the answer is undefined.
After not doing very well, I was heartened to see the “You scored 20 out of 20″ message. Software is hard!
I didn’t expect that it would be this tricky
thanks
> Assume x has type int. Is the expression x<<32…
32? You're using magic numbers on a quiz about UB? int32_t is probably what you intended.
gergo is correct about Q18. Even given x86 or x86-64, which tells us that we have 2s completement 16 bit short and 32 bit int, casting an int value >= 2^15 still gives an implementation-defined result – it does not necessarily “truncate”.
If you changed the question to use unsigned short instead, then it would be right.
I think Gergo is right to question Q18. short is 16 bits wide in your model right? Then the standard says 6.3.1.3: “Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.” Your explanation of your model doesn’t obviously preclude the signal being raised. (If you do get back an implementation-defined result, then that means that the cast is well-defined, and then the cast back to an integer will give you the same value in that type, and that value + 1 will be OK.)
The real question is: “Which evil compiler faction insisted on that horrible ‘implementation-defined signal’ language?”
I think you’ve pulled a fast one with a couple things.
First and most strongly, your scoring of Q5 I think is bogus. Since you didn’t say whether char was signed or not, I answered it’s undefined, but you marked it wrong saying “I didn’t give you enough information.” But the lack of information was enough information. Singling out ’0′ as particularly incorrect might have made some sense, but marking ’1′ as correct and ‘undefined’ as incorrect is wrong. (Also note that the answer is “undefined”, not “invokes undefined behavior,” which would have actually been wrong.
)
Second, you specify a lot of particulars about the architecture (ints are 32 bit two’s complement, etc.) and it wasn’t clear whether you considered overflow as undefined behavior or not. (There are good reasons not to — even on a two’s complement machine the optimizer can do some stuff like optimize ‘x + 1 > x’ to ‘true’ — I’m just saying it wasn’t clear what you intended.)
Your quiz told me I scored 20 out of 20. In fact, for one question I selected the “wrong” answer, and for another, with 4 options, I selected all three wrong answers before selecting the correct one.
And your explanation for question 18 is definitely incorrect. Given:
int x;
(short)x + 1
if x is outside the representable range of type short, the result of the conversion is implementation-defined *or* it raises an implementation-defined signal. Assuming specific ranges for int and short doesn’t avoid this issue.
I got a few questions wrong but got “You scored 20 out of 20.” after clicking Get Results.
Even avoiding the implementation-defined signal, if you get an implementation-defined value back then the result of the entire expression is implementation-defined – all we know is that it must be somewhere in the range 1..2^15. If the answer is one of 32768 different possible values then I struggle to call that “well-defined”
You need another option for many of these,
“Don’t do it”
I agree with the other commentators: Q18 is wrong. “(short)x” has undefined behavior if x does not fit in a short int.
that’s why that simple code didn’t work as expected. nice quiz!
Hi Marius (and others): Sorry about the scoring bug! I’m guessing it’s a flaw in the quiz plugin for WordPress that I’m using. It’s also possible that I misconfigured something, but I don’t think so…
Michael and others, regarding Question 18: I’ll stand by my answer since I’ve never heard of an LP64 compiler that doesn’t just truncate. But going by just the standard, you folks are of course correct.
Evan, I was using undefined as shorthand for “undefined behavior.” I find that to be reasonably clear even after reading your comment, but agree that I could have worded it better.
Regarding signed overflow being defined or not, compiler developers generally draw a sharp distinction between undefined behavior and implementation-defined behavior. 32-bit ints, 2′s complement, etc. are examples of the latter and signed overflow is an example of the former. A lot of developers do not draw such a sharp distinction, which is why I made a point of asking questions about this issue.
Steve, I agree, people shouldn’t do most of this stuff. But experience has not shown that to be useful advice, because people still do it.
Yeah, I concur with the other commenters re Question 5 and Question 18: for Question 5, the correct answer is “implementation-defined”, not “1″; and for Question 18, the correct answer is indeed “defined for all values of x”, but that’s a really sneaky trick, because you’re overloading the word “defined” to mean “sometimes well-defined and sometimes implementation-defined, depending on the value of x”!
Errors in the quiz:
3. (unsigned short)1 > -1: This will evaluate to 0 on systems where sizeof(short)<sizeof(int), and 1 on systems where sizeof(short)==sizeof(int). Yes, such systems exist. Old Cray supercomputers and various DSP processors don't have byte-addressable memory, only word-addressable, so they make all primitive types a word long.
5. SCHAR_MAX == CHAR_MAX: The person who wrote the quiz even ACKNOWLEDGES that the quiz is incorrect here and apologizes. This is implementation-dependent, it is 1 when char is signed and 0 when char is unsigned. Both types of systems exist. You can use -funsigned-char on GCC, for example.
11: int x; x << 31: This is only defined for some values on platforms where int has at least 32 bits. There exist systems where int has 16 bits, in which case this is undefined for all values. Old DOS PCs often used 16-bit ints, and I believe that ints were 16 bits when C was invented.
12: int x; x << 32: This is only undefined on systems where int has no more than 32 bits. There exist systems where int has more bits. Do a search for ILP64 if you wish to hear about such systems.
14: unsigned x; x << 31: Again, this is only defined on systems where int has at least 32 bits. See #11.
15: unsigned short; x << 31: This one is tricky. There are four different cases, depending on the size of int and whether sizeof(short)==sizeof(int).
15a: sizeof(short) == sizeof(int): defined for all x, since x gets promoted to unsigned.
15b: sizeof(short) < sizeof(int), int has less than 32 bits: defined for no x. I don't think any such systems exist.
15c: sizeof(short) < sizeof(int), int has at least 32 bits but less than 32 more than a short: defined for some x. This is the most common, with 16-bit short and 32-bit int.
15d: sizeof(short) < sizeof(int), int has at least 32 more bits than a short: defined for all x. This is the uncommon ILP64 system.
18: int x; (short)x + 1: This is outright incorrect. Casting int to short results in undefined behavior if the value cannot be represented as a short. Truncation is only guaranteed to occur for unsigned types.
78%… to be honest, I’m surprised I did that well.
In the answer to Q17 you write “If these operators were right-associative, the expression would be defined for all values of x.” I disagree. In addition to surprising results (x – 1 + 1 == 0 for x being 2) this would be UB for x being INT_MIN or INT_MIN+1
Arthur and others: implementation-defined still is defined, no sneaky trick there
If you like, read it as “not undefined”. Well-defined means you exactly know what you get – defined (without the well part) “only” means you are guaranteed to get something reproducible on a given architecture, i.e. no crashes, no burning hard drives, no imploding universe.
Hi anonymous #37, I’m kind of depressed that you spent so much time writing about errors in the quiz instead of spending just a little more time reading the information at the top of the post.
Answers 16+17: typo (iff)
The information at the top disappears when you click “start”. Lots of people probably clicked ‘start’ right away and never got a chance to see it. So they didn’t know the C99 and x86/x64 constraints (I didn’t until I started hitting answers that referenced them).
And even WITH the information at the top, you’re not consistent about your treatment of implementation-dependent vs “we know what this is for all compilers for x86/x64″. Some you go ahead and say ‘we know what this’, and some you say ‘well, since the spec says it’s undefined, it’s undefined’, even though every compiler on those platforms produces the same consistent result.
This is very very bad. This quizz mixes implementation-defined behavior with a few correct questions.
ints can be 64 bits long, for instance. And the signedness of character is implementation-dependent, even on common-use architectures like ppc. So this is hugely misleading in the direction of “all the world is a PC”.
Heck, such a quizz should start by mentioning the rules. It does not even say what C language it’s talking about, whether it’s any C, C89, C99, or C2011 !
Oh actually, now I see the frontpage of the quizz. Well, as usual, since it’s possible to bookmark any page, I got redirected there by a link to the quizz start, not the front material.
Anyhow, some of my comments still stand. I very much frown upon *anything* that says “assume x86 or x86-64″, since I will be the one fixing that pile of shit written by “knowing programmers” to run on, say, sparc64 or macppc.
Utterly humiliating! But should I be upset that I got so many wrong or that programmers allow themselves to use such rubbish languages?
I hadn’t realized how dangerous the implicit conversions (esp. unsigned to signed) were – or that shifting a signed integer is so prone to being undefined.
Qusetion 12: GCC 4.4.5 on x86_64 only gives warning and the result is 0:
test.c: In function ‘main’:
test.c:16: warning: left shift count >= width of type
Question 13: Not even a warning. Compiles and works even for x=-1.
Thanks, learned a lot from the quiz.
Question 5 contains two identifiers whose values are implementation defined (i.e., SCHAR_MAX and CHAR_MAX) and an equality comparison. The result is either 0 or 1, i.e., the result is unspecified.
Question 17: the status of x – 1 + 1 depends on the order of evaluation and can be undefined is 1 is subtracted from x rather than 1 being added to 1. It is common practice to consider the undefined to dominate and call the result undefined.
Question 18: if the quiz is about what the C Standard says then (short)x+1 is only defined for some values of x if it has type int. Re: comment #33 if we need to take the authors experience into account the answer can be almost anything and spoils the point of the quiz.
Question 20: Something more sensible would be:
What is the value of: sizeof (unsigned short)-1
with possible answers:
a: 1
b: 2
c: 4
Question 18 is wrong: if x > SHORT_MAX or x < SHORT_MIN, then (short) x is undefined and so is (short) x + 1. So (short) x + 1 is defined for some values of x, not all, when x is int.
Question 5 is implementation-defined, and saying it's x86 of x86-64 is not sufficient. I got it wrong since I answered "undefined".
Re #18, conversion of an out-of-range value from int to short is *not* undefined behavior. It yields an implementation-defined result or raises an implementation-defined signal. (The
latter permission was added in C99; I know of no compiler that takes advantage of it.) But the front page mentions several implementation-specific assumptions; the result of this conversion is not among them.
Question 4 not accurate. For example in Win64 sizeof(long) == sizeof(unsigned).
In Q.17, won’t the compiler after the abstract syntax tree creation, during optimization, evaluate +1-1=0 and replace it with 0 before evaluating the entire expression?
I want to call foul on a couple of these!
In #5, the explanation basically says that “undefined” is the right answer but you’re assuming a standard x86 PC.
In #11-15, we don’t know the size of any of these types. For #12, for example, sizeof(int) could well be 64. UNICOS comes to mind.
(If we’re to assume these questions only apply to a particular platform, then it should say that; if we’re to assume only the C language, then these answers aren’t right.)
All of this leads me to believe that the few simple rules I have, when I need to write in C, have treated me very well over the years: (1) never assume booleans are anything other than zero and nonzero, (2) never use unsigned types, and (3) if you need integers greater than 10,000, get a bignum library.
Hi Jesse (and others), I’ll just repeat the that platform information is specified at the top of the quiz…
“I’ll just repeat the that platform information is specified at the top of the quiz…”
I’ll repeat: the information is NOT at the top of the quiz. The information is on the page when you first land, but DISAPPEARS when you open the quiz.
Obviously you approved the previous comment, but maybe you didn’t actually read it since:
(a) you didn’t fix it
and (b) you’re still using false language (AFAICT it’s impossible to have it on the page at the same time as the quiz, so it is NOT “at the top of the quiz”).
Hi anonymous, you’re right, it’s at the top of the post containing the quiz.
As my colleague Matt Might likes to say: There’s always someone in the world who will take exception to anything you do. The Internet helps you find that person.
I’m not pointing this out idly.
I’m pointing this out because the comments here are fillled with people who don’t seem to have noticed it, and I’m pointing out a particular, FIXABLE reason they might be doing so.
Your job as a writer is to communicate with your readers. If you fail to communicate to a large number of your readers, you need to stop blaming them and consider that maybe your UX is actually flawed.
I’m a bit surprised about Question 18. Question 5 mentions that overflowing a signed integer is undefined behaviour, and others have pointed out that conversion to a signed integer type that cannot hold the value gives implementation-defined behaviour. We’re given some assumptions about architecture but no assumptions about any particularly implementation (of which there are many for the given architecture).
Hi Dave G, every modern C implementation that I am aware of chooses to define the integer down-casting behavior to be truncation, so I figured this could safely be considered to be a side effect of “assume LP64″. But yes, it would have been better to state this explicitly.
You should not equate this implementation defined behavior with an undefined behavior such as signed overflow. That is a totally different beast.
I’m slightly amused at the people who are angry that THEY didn’t read the instructions
I got 78% but 20/20 yay! The quiz gives the correct result if you don’t pick the right answer after having picked the wrong one.
I understood immediately the questions I got wrong. This makes me want even more that C/C++ compilers catch undefined behaviour whenever they can when compiled with debug settings.
Congratulations on killing platforms where sizeof(int) < 4.
The signed char question was likewise mean.
Everyone using ARM (esp. Thumb) now must hate you.
This is very interesting. I’ve learned a lot about C integers. This will be very helpful in programming either in C or in any other language.
Very instructive.
I didn’t get all the answers right, because my last C program is some 20 years ago and probably C99 is the 1999 implementation of ANSI C, which is far away from my conception of C. My knowledge is basically derived from K&R C with some book knowledge of ANSI C.
An excellent quiz. Made me realize why I tended toward the use of unsigned 64-bit integers and avoided the corner cases as needed. Math and C are a dangerous combination.
About question 9 (x << 0) I bet common compilers would simplify it into a no-op even with the lowest optimisation level… But if the standard says it is undefined behviour, people say something really odd is going to happen likely. (If the "likely" means on 1% of compilers on 1% of hardware, no matter: it is still undef behaviour, so compilers should make it behaves badly, just to be aligned – ok undef behaviour states a "contract" the programmer can trust, and in this case he can't… 1% of the time, at least). This also suggests that we are dealing with logical shift rather than arithmetic shift (which should be always well defined). So, always do your bit games with unsigned…
I graduated from the University of Utah with a computer science degree back in the70′s and I do not recall any detail instructions on integers. But then, the C programming language did not exist or was too new.
A lot of times, when confronted with such situations, I would look at the assembly language the compiler generated to see if it was doing what I wanted.
Hi RD, I do that too.
But you have to be really careful with undefined behavior since the next time you upgrade your compiler, change optimization options, or even recompile after changing some unrelated code, the assembly language corresponding to the undefined code may do something totally different. For an example, see the comments starting at #3 here:
http://blog.regehr.org/archives/722
I’ve been programming and using C for 40 years.
None of these questions came up in my professional experience.
Never had a bug in my programs related to these issues.
If one of these questions comes up and disqualifies me in a job interview, I do not want that job
Hi paca, actually this quiz is only for programmers who make mistakes. You go!
Q18: (short)x + 1 is implementation-defined (and may raise a signal), because (short)x is implementation-defined and may raise a signal if x is outside the range of short — see 6.3.1.3.3
Q20: Since INT_MIN/-1 (which would be one greater than INT_MAX assuming twos-complement) is not representable as an int, INT_MIN % -1 is undefined (6.5.5.6)
Hi Matthew, I’ve read the “multiplicative operators” part of the C99 standard very carefully and do not believe that your interpretation is the most reasonable one. They fixed the language in C11.
Nice quiz and learning experience. Thanks!
INTERESTING!!!! LEARN FEW THINGS
As stated scoring doesn’t take into account which attemp the correct answer was arrived at.
Great quiz though, learnt a few things.
For #4, actual program output in my quick test was 1 for both x86 and x64 — so I guess either the answer is wrong or gcc is buggy.
Hi Al, the answer to #4 is correct, as are a few versions of GCC that I tried. If you provide more detail I can try to repro your result.
{ 6 } Trackbacks
[...] Muy buen quiz para mantener vigentes los conocimientos básicos del Lenguaje de programación C The C language’s rules for integer operations have some quirks that can make even small programs behave in confusing ways. This post is a review of these rules in the form of a quiz containing 20 questions. I recommend going through the questions in order. If you are a beginning C programmer, you should consult a C book as you go through these questions since there are a lot of little things (such as what “1U” means) that I have not bothered to explain. If you are a serious C programmer, I expect you’ll do well — this quiz is not intended to be extremely difficult. You should assume C99. Also assume that x86 or x86-64 is the target. In other words, please answer each question in the context of a C compiler whose implementation-defined characteristics include two’s complement signed integers, 8-bit chars, 16-bit shorts, and 32-bit ints. The long type is 32 bits on x86, but 64 bits on x86-64 (this is LP64, for those who care about such things). TOMAR QUIZ [...]
[...] A Quiz About Integers in C [...]
[...] reading here: Embedded in Academia : A Quiz About Integers in C Bookmark to: This entry was posted in Uncategorized and tagged Form, integer-operations, [...]
[...] weekend I came across this quiz about integers in C by John Regehr, and it reminded me of a bug I encountered last [...]
[...] loves a quiz – so here’s one on integer maths in C. I got 10/20 fully correct (i.e. ignoring the ‘partial credits’) – which [...]
[...] Los que han hecho pruebas de inteligencia deben reconocerlo, en este caso se trata de identificar qué tienen en común los dibujos en todas las cajitas de la izquierda. Bien. ¿Qué tienen en común todas las cajitas de la derecha? Usualmente yo no salgo tan mal en este tipo de pruebas. Por lo general estas pruebas consisten en identificar cuál es el siguiente dibujo en una secuencia. También puede ser el siguiente número en una secuencia de números. Por ejemplo, cuál es el siguiente número en la siguiente secuencia: 1,2,3,5,7,11… La respuesta es fácil, debe ser 13 porque la lista contiene solamente números primos. Pero, ¿y en esta lista? ¿cuál es el siguiente? 2,10,12,16,17… Creo que la prueba original llega hasta el 16 pero la estoy poniendo más fácil. El punto es que no pude resolver en tiempo razonable (¿2 minutos?) la prueba Bongard #38. Y peor aún, abandoné en la pregunta #4 esta prueba de conocimiento sobre lenguaje C. [...]