The other day Neel Krishnaswami mentioned that he’s going to be teaching the C class at Cambridge in the fall, and asked if I had any advice about that topic. Of course I do! In fact the response got so long that it ended up being a blog post.
My main idea is that we need to teach C in a way that helps students understand why a very large fraction of critical software infrastructure, including almost all operating systems and embedded systems, is written in C, while also acknowledging the disastrously central role that it has played in our ongoing computer security nightmare.
There’s a lot of reading material out there. For the basics, I still recommend that students purchase K&R. People say good things about C Programming: A Modern Approach; I’ve only skimmed it. For advanced C I’ve not read a better book than Expert C Programming, though like K&R it is fairly old. The Practice of Programming is a really great book though it’s not completely specific to C. I haven’t read all of it but from what I’ve seen Modern C is a very good resource, with AFAIK the best treatment of undefined behavior of any C book. The C FAQ contains lots of good material.
For supplemental reading, of course the students need to look at all three parts of Chris Lattner’s writeup about undefined behavior, and mine as well.
What version of C should we teach? Probably a common subset of C99 and C11. In a first C class there’s no need to go into advanced C11 features such as the concurrent memory model.
We’d like students to be able to answer the question: Is C an appropriate choice for solving this problem? We’ll want some lecture material about C’s place in the modern world and we also need to spend time reading some high-quality C code, perhaps starting with Redis, Musl, or Xv6. Musl, in particular, is a good match for teaching since it contains lots of cute little functions that can be understood in isolation. From any such function we can launch a discussion about tradeoffs between portability, efficiency, maintainability, testability, etc. If Rich Felker (the Musl author) did something a certain way, there’s probably a good reason for it and we should be able to puzzle it out. We can also use Matt Godbolt’s super awesome compiler explorer to look at the code generated by various compilers. C’s lightweight-to-nonexistent runtime support is one of its key advantages for real-world system building, and it also means that generated code can be understood without thinking about something like a garbage collector.
We probably also need to spend a bit of time looking at bad old C, the kind that makes the world work even though we’re not proud of it. We can find files in OpenSSL and in the PHP interpreter that would singe your brain despite getting run billions of times a day, or we can always pick on an old standby like glibc — worth looking at just for the preprocessor abuse. But perhaps I am being uncharitable: Pascal Cuoq (reading a draft of this piece) correctly points out that “even what seems like plain stupidity often stems from engineering trade-offs. Does the project try to remain compilable under MS-DOS with DJGPP, with C90 compilers, under VMS, or all three at the same time?” And it is true that we would do well to help students understand that real-world engineering constraints do not often resemble the circumstances that we lead them to expect in school.
The second big thing each student should learn is: How can I avoid being burned by C’s numerous and severe shortcomings? This is a development environment where only the paranoid can survive; we want to emphasize a modern C programming style and heavy reliance on the (thankfully excellent) collection of tools that is available for helping us develop good C.
Static analysis is the first line of defense; the students need to use a good selection of -W flags and then get used to making things compile without warnings. A stronger tool such as the Clang static analyzer should also be used. On the dynamic side, all code handed in by students must be clean as far as ASan, UBSan, and MSan are concerned. tis-interpreter holds code to an even higher standard; I haven’t had students use this tool yet but I think it’s a great thing to try. Since dynamic testing is limited by the quality of the test cases, the students need to get used to using the output of a code coverage tool to find gaps in test coverage. Lots of coverage tools for C are available but I usually just use gcov since it is ubiquitous and hassle-free.
Teaching undefined behavior using sanitizers is a piece of cake: the tool gives students exactly the feedback that they need. The other way of teaching undefined behavior, by looking at its consequences, is something that we should spend a bit of time on, but it requires a different kind of thinking and we probably won’t expect the majority of students to pick up on all the subtleties — even seasoned professional C programmers are often unaware of these.
Detecting errors and doing something about them is a really important part of programming that we typically don’t teach much about in school. Since C is designed to avoid sweeping these problems under the rug, a C class is a great place to get students started on the right track. They should have to implement a goto chain.
Something I’m leaving out of this post is the content of the assignments that we give the students — this mostly depends on the specific goals of the course and how it fits into the broader curriculum (In what year are students expected to take the class? What kind of background do they have in math and science? What languages do they already know?). I’ve always taught C as a side effect of teaching operating systems, embedded systems, or something along those lines. In a course where the primary goal is C we have more freedom, and could look at more domains. Image processing and cryptographic algorithms would be really fun, for example, and even the old standby, data structures, can be used to good effect in class.
I’m also leaving out build systems and version control. They should use these.
In some courses I will give students access to the test infrastructure that will be used to grade their code. This makes assignments a lot more fun, and makes students a lot more happy. Other times I will give them a few test cases and keep the good tests (and the fuzzers) for myself. The idea is to make the assignment not only about implementation but also about testing. This stresses students out but it’s far more realistic.
Pascal remarks that “C is mostly taught very badly, and a student who aims at becoming good at maintaining C code will need to unlearn much that they have (typically) been told in class.” This is regrettably true — a lot of instructors learned C in previous decades and then they teach an outdated language, for example failing to discourage preprocessor abuse. The most serious common failing is to leave students unaware of their side of the bargain when the deal with a C compiler. I am talking of course about undefined behavior (and, to a lesser extent, unspecified and implementation-defined behavior). As a concrete example, I have taught numerous classes based on Computer Systems: A Programmer’s Perspective. In most respects this is an excellent book, but (even in the 3rd edition) it not only ignores undefined behavior but, worse, explicitly teaches students that signed integers in C have two’s complement behavior on overflow:
This claim that positive signed overflow wraps around is neither correct by the C standard nor consistent with the observed behavior of either GCC or LLVM. This isn’t an acceptable claim to make in a popular C-based textbook published in 2015. While I can patch problems in the book during lecture, that isn’t very satisfying, and not all instructors have the time and expertise.
One might argue that we shouldn’t be teaching C any longer, and I would certainly agree that C is probably a poor first or second language. On the other hand, even if we were in a position where no new projects should be written in C (that day is coming, but slowly — probably at least a decade off), we’re still going to be stuck maintaining C for many decades. A random CS graduate has pretty good odds of running into C during her career. But beyond that, even after we replace C, the systems programming niche will remain. A lot of what we learn when we think we’re learning C is low-level programming and that stuff is important.
Thanks to Pascal Cuoq and Robby Findler for commenting on drafts of this piece.