This semester I’m teaching a new undergraduate course called Writing Solid Code. The idea is to take a lot of lessons I’ve learned on this subject — testing, debugging, defensive coding, code reviews, etc. — and try to teach it. Since I don’t have any slides or even any lecture notes that are good enough to put on the web, I thought it would be useful to summarize each week in a blog post.
One thing that I wanted to do is create an environment where cheating is impossible. To accomplish this, all student code is going into a public Github repo, and everyone is free to look at everyone else’s code if they wish to. In other words, cheating is impossible because the actions that would normally be considered cheating aren’t. Of course, I am encouraging students to write their own code. In a self-selected group of students I don’t expect problems. If people want to exchange ideas or test cases or even subroutines, then great.
The other thing I wanted to do was get away from the kind of grading where a failing test case detracts from a student’s grade. So I am not going to do that. Rather, everyone starts with a default grade of “A” and they only lose this grade if they fail to attend class, fail to participate in activities such as code reviews, or fail to submit assignments. I am counting on people’s competitive spirit to drive them towards creating code that is at least as solid as their peers’ code. This has worked well in the past and I expect it to work again.
The class is going to be structured around a set of small coding exercises, each designed to reinforce some aspects of writing solid code. Each exercise will be iterated until all (or as many as possible) students’ codes pass all test cases that they — and I — can think of. If this takes too long I’ll start making the projects easier; if (as I expect) we can converge quickly, I’ll make the projects more complicated. This structure is designed to go against the grain of the typical fire-and-forget coding style that CS students are trained to adopt; I’m convinced that it builds very poor habits. The idea that I want the students to get out of these exercises is that in most cases creating a piece of code is the easy part; the hard part is figuring out what the code is supposed to do in the first place and also making sure that the code actually works as intended.
The first exercise is the venerable triangle classifier. The students, with a bit of guidance from me — their program manager — developed the spec in class on Tuesday. Their code is due next Tuesday, along with a collection of test cases that achieves 100% partition coverage as well as 100% coverage of feasible branches. This exercise is in C. The language isn’t very relevant, but C does enjoy good tool support. We may switch languages later on.
Since we don’t yet have any code to look at, I lectured on testing today. Basically I focused on the three hard problems in testing:
- Generating test inputs
- Determining if the result of a test is acceptable
- Determining what a collection of passing test cases means
This was more or less an abbreviated version of the material I tried to cover in the early parts of my Udacity course. I wish that I had a nice collection of notes about this but I don’t (yet).
51 responses to “Writing Solid Code Week 1”
This sounds like an awesome idea.
Is it an introduction to programming course? In other words is the goal to unlearn bad habits, or are you trying to teach good habits from the start?
Will you cover fuzzing? Compiler warning options and static analysis?
Are you planning any adversarial exercises, where students will to find bugs (or missed code coverage) in their classmates’ code?
Jesse, yes and yes and yes and yes! Unless we run out of time that is. I don’t yet have a sense for how much I can get done in a semester.
The triangle exercise sounds hard! A straightforward implementation would use floats, which I wouldn’t trust for comparisons.
I think it’s possible to do the entire exercise using bigint math. In comparing the lengths of the sides, I’d just skip the square-root step. I can find out enough about the angles (zero / acute / right / obtuse) by examining the dot and cross products. But I wouldn’t expect all computer science students to know that!
Do unit tests count for the branch coverage requirement, or only full triangle tests? By “partition coverage” do you mean hitting all of the 8 (?) possible outputs, or are you referring to some partitioning of the possible (valid) inputs?
One way to fuzz the triangle programs would be to generate random triangles and feed the six permutations of {A, B, C} to the program. If the program says ABC is acute while ACB is obtuse, then something is wrong.
FWIW, I believe Coverity provides free access to students and open-source. I have very limited experience with static analysis tools but Coverity’s C analysis has always amazed me.
After thinking about the triangle problem and attempting to reason up a solution mentally, I’ve come to the conclusion that if your solution requires floating point arithmetic, the solution is wrong (all you need is the sign of the cosine, which just requires some additions and multiplications–not even division). I also think you don’t even need bigint.
I wonder if the triangle problem was chosen specifically to see if students can adapt common algorithms to looser constraints?
Hopefully your students aren’t reading your blog, because, at this rate, we’re going to be just telling them the solution! 🙂
On further thought, choosing the maximum values to be 55 bits or so makes the problem much harder: 64-bit integers can’t do the requisite multiplications in places, and doubles don’t have enough precision to be able to precisely hit the sweet spot needed for a right triangle. The fun part there is trying to craft a testcase which illustrates the failure, or working out how likely randomized testing is to find examples where the code is wrong.
I know that some of the students are using doubles and some are using GMP and some are using long ints.
My preliminary thought was that since every 31-bit integer can be represented exactly in a double with plenty of bits to spare, FP would be likely to provide exact answers here if a bit of care were taken in the code. But I haven’t checked to make sure that works.
Re. testing by permuting the points, we were actually discussing that exact thing in class today! And there are plenty of analogous tricks that also should not change the answer: rotate a triangle, scale it, translate it, etc.
Devdatta, good idea, we should try Coverity.
Joshua, I have pointed the class to this post. But no problem — if they can go from your notes to a working solution then that is ok with me.
I’m pretty sure that the 31 bits in the specification makes this problem fairly easy, or at least I’m hoping so, since a truly difficult problem would tend to move the spotlight away from the testing issues that I hope to highlight here.
Nthing that 31 bits is a very convenient limit — the calculations for both determinants and Dijkstra’s version of the Pythagoras theorem (EWD 975) just barely fit into 64-bit integer arithmetic.
I’m thinking that you are inevitably going to come upon cases where the subtle details of FP are going to rear their ugly heads. Not that that’s entirely a bad thing, but it might be distracting for both you and the students.
That said, I thoroughly approve of this course and wish it existed at my school all those years ago…
Mike, if the inputs were FP then I 100% agree with you. With integer inputs, I think we at least have a chance of being lucky about FP errors. But we’ll see.
Ideally there will be some FP problems so we can hash out these issues in class. That said, I’m planning at most one more exercise that involves FP. That stuff gets old fast.
David, I’ll have to take your word for it — unfortunately I use linear algebra way too seldom these days and I’ve lost most of what I once knew. Thanks for the EWD pointer, I’m reading it now!
Great! This post is the email I was hoping to get from you. 🙂
Not the most important thing, but I have found that the public repository for all class code does seem to deter cheating. I see it once in a while, but it seems less frequent when it is “world visible” — and in my class, the public view is almost required since testing the other student code is the whole point. Perhaps it is a corollary of that line from that great Cordwainer Smith story, “Mother Hitton’s Luttul Kittons”:
Poor communications deter theft;
good communications promote theft;
perfect communications stop theft.
—Van Braam
I hate the triangle classifier as an example in research papers, but it does do a nice job of showing the differences/similarities between some partition and code coverage approaches.
Also, not to be pedantic, but shouldn’t the spec also require that in addition to matching that regexp, the statement also describes the actual properties of the triangle given?
It looks as though at least one student approached the problem the way that I would have, so you won’t have to take my word for 31-bit coordinates being convenient (at least, once he fixes that one integer overflow bug :)).
Another factor in writing solid code is the art of determining if gaps in the requirements are don’t cares (allowing you to make simplifying assumptions) or accidental omissions from the customer (where you must fill in the missing complications). Less complex tends to correlate with more solid.
Another thing that will likely be useful in this problem is looking at the various exact algorithms one sees in computational geometry. One of the building blocks of those that is useful here is to look at the cross product of the points. The sign of the result can be used to classify the third point in relation to the others. It is easy to talk about turns in this context, ie when traveling on a line segment from point A to point B, when one gets to B does one turn left, right, or continue straight to get to point C? This can be computed exactly without resorting to something like GMP or floating point with just integral types. Think of this as the right-hand rule one learns in a basic physics course formalized a bit.
Here’s a test that’s likely to break the floating-point solutions (they will print “isoceles” instead of “scalene”). Even with a hypot function correctly rounded in IEEE double precision, the two shorter sides of this triangle appear to have the same length, as the exact (nonzero) difference is much less than 0.5 ulp.
triangle 0 0 1073741822 1073741824 2147483645 2147483647
Hi Alex, yes, the spec is taking a lazy approach to defining the correct output — hopefully not a problem in practice.
David, I’ll be interested to see how these different approach fare when we start doing differential testing.
Don, we’ve been talking about that kind of thing in class! Definitely important stuff.
Will again my crap linear algebra skills are doing me a disservice. I’ll have to brush up. I believe the approach you suggest is the one suggested by other commenters, right? One of the things I’m very interested to see here is which approach leads to the clearest and smallest (correct) code.
David, I’ll add this special case to my tester.
One interesting thing is that a random tester is very unlikely to generate a right triangle.
Also, I’m surprised that nobody here has mentioned that with the current spec, it’s impossible to specify an equilateral triangle :). At least I believe so.
Out of curiosity, what is the computing and math background of the students? Judging from the submissions, it looks like most of them are using floating point arithmetic for the first time in their lives. (I’m using the lack of the 1e-6-ish notation as an indication that most of them are new).
Unfortunately, the acos function implemented is likely to be monotonic, which means crafting a floating point example that breaks the operator misuses may not be possible.
I’ve noticed two basic patterns here. One is that almost all of the students seem to have the same basic way of finding the angle (law of cosines), while I approached it via a different rule (angle between two vectors). This uniformity strikes me as odd, although there are different ways of testing for collinearity. The other note is that the 4 commenters on this blog appear to have come up with the non-floating point solution before trying the floating point one, while the students all alighted on a floating point solution immediately.
One student claims that the solution needs GMP with a testcase, but doing the worst case analysis, that’s not true… if you do the algorithm very carefully. It’s also telling that doing the algorithm a different way doesn’t require as much care: the maximum peak intermediate values are going to be slightly under 2^64 and 2^62 for the two algorithms I’m thinking of.
Hi Joshua, most of the students in this class are in their third year of a CS degree. You know, I’m not sure what FP coding they have run into in their curriculum so far. Thinking back 20 years, I had to take a numerical analysis course as part of either a CS or math degree (can’t remember which degree it was required for) but I never actually had to write a single line of FP code for either degree. I’m guessing that might still be the case, in which case we wouldn’t expect a very high degree of sophistication.
Anyway I’m glad you are bringing this up. We are going to be spending a lot of time doing code reviews in this class and these are the kinds of issues that should be discussed!
> the 4 commenters on this blog appear to have come up with the non-floating point solution before trying the floating point one
I learned the hard way why robust geometric primitives for computational geometry algorithms are a Good Thing. I’m not sure what I would have done when I was a third-year undergraduate.
> 2^62
For determining acute/right/obtuse, yes, at the cost of more multiplications, but what about comparing lengths? The obvious bound is 2^63 exclusive, which does suffice to avoid unsigned long long.
Proof that the equilateral triangle is not possible: http://math.stackexchange.com/questions/105330/equilateral-triangle-with-integer-coordinates
Here are some tricky test cases. The latter two are designed to foil solutions based on double-precision acos — given that acos(0) is exactly pi/2, thresholding acos(x) on pi/2 is problematic when x is (roughly) on the order of 1e-16 or below.
./triangle 0 0 1073741823 1073741823 2147483646 2147483646
./triangle 0 0 1073741822 1073741824 2147483645 2147483647
./triangle 0 0 1073741822 1073741824 2147483646 2147483646
./triangle 1 0 0 2147483647 2147483647 1
./triangle 1 0 0 2147483646 2147483647 1
I believe that the correct outputs are
not a triangle
scalene obtuse
isosceles obtuse
scalene acute
isosceles right
but don’t take my word for it :).
I’m pretty sure that the 31 bits in the specification makes this problem fairly easy …
Where do you see this in the specification? Personally, I’m having a hard time with this exercise under the assumption that int is the largest integer type on my system.
(And no, I’m not a student in the class, just an observer interested in the subtle difficulties here.)
The spec now indicates 31-bit integers. I think it used to specify INT_MAX on a 32- or 64-bit Linux machine, which amounts to the same thing.
I finally broke down and wrote my own solution. I’ll post it in due course.
Right, I thought it would be clearer to just specify 31 bits directly.
David can I ask you to wait to post your solution? I don’t want to short-circuit the testing games that I have planned for the class!
Think how fun a quadrilateral analyzer would be. I love the lattice at the bottom of this page:
http://www.mathsisfun.com/quadrilaterals.html
Maybe I’m missing something, but why couldn’t INT_MAX be, say, 2^63 – 1 on a 64-bit Linux machine?
Hi Joshua, isn’t Linux guaranteed to follow LP64 on x86-64 and ILP32 on x86?
http://www.unix.org/version2/whatsnew/lp64_wp.html
Interesting. I had always thought that issues like sizeof(int) were left up to compiler-writers. I’ve also seen way too many StackOverflow posts that make a big deal about the lack of consistency implied by the C standard. It’s refreshing to see people trying to change this, at least where practical. Is this effort something that third-year computer science (admittedly not my field) students would be expected to be aware of?
Joshua my take is that this particular issue can be understood at three levels:
1. since int is 32 bits on the specified platforms, the spec mandates 31 bits
2. since the size of an int is implementation-defined, the spec does not mandate 31 bits
3. due to LP64 and ILP32, the spec mandates 31 bits
There may be even more to this, but this is what comes to mind. Now we might ask: at what level would a typical third-year CS student be operating? Actually there’s a lot of variation in how sophisticated people are at appreciating these issues. Many CS students these days are at best minimally literate in C, but on the other hand we have a number of people who have real development experience and they would just know this stuff.
But none of this is really to the point. The point is that in this course I’m not trying very hard to create bulletproof specs. The students need practice recognizing and coping with ambiguity.
Personally, I didn’t even realize that level 1 was guaranteed to be true — as far as I can tell, that claim follows from level 3. I was solidly fixed at level 2, so you’ve clearly (and unsurprisingly) thought about this at a deeper level than I have. Certainly the exercise becomes much simpler if one has a larger type than int to perform computations in, and if that’s the intended exercise then so be it. I just originally thought that you had something harder in mind.
Personally, I didn’t even realize that level 1 was guaranteed to be true — as far as I can tell, that claim follows from level 3. I was solidly fixed at level 2, so you’ve clearly (and unsurprisingly) thought about this at a deeper level than I have. Certainly the exercise becomes much simpler if one has a larger type than int to perform computations in, and if that’s the intended exercise then so be it. I just originally thought that you had something harder in mind.
Floating point solution is wrong, since it requires floats to compare to each other, which is always invalid. Integer solution is fairly simple and will fit in 63 bit unsigned integer.
I find it very interesting, that all students went for floating point solution. Do anyone has a concept, why? For me integer solution feels much more natural, maybe because i’ve learned about not using floating point when possible in a hard way.
David, i love Your corner cases. 🙂
Hi RadosÅ‚aw, I am open to the possibility that FP solutions might work if the math is done carefully, using either 64-bit or 128-bit floats. I am very interested to start testing. I’m writing a test program based on David’s ideas and other ideas, but I do not want to make it available to the students until they have done some testing work on their own.
Hi Joshua, you’re right that level 1 isn’t guaranteed to be true, I was trying to say that it is an incorrect level at which one might approach this problem, which is OK in class because it leads to good discussion
> requires floats to compare to each other, which is always invalid
I’m not sure I would put it that strongly. As part of some code in which I could have avoided floating-point arithmetic completely, I wrote a subroutine that changed the rounding mode and, in principle, correctly lowerbounded the value of a particular linear program. Getting a *good* lowerbound was merely an optimization.
(On the other hand, I left out the part where the compiler tried to eliminate a common subexpression before and after a call to fesetenv().)
> all students went for floating point solution. Do anyone has a concept, why?
To be fair, it wasn’t everybody. Look at the spec, though:
> A triangle is right if one of its angles is 90 degrees
Surely it doesn’t surprise you that someone with less experience with you might interpret this as a non-functional requirement, does it?
I hardly ever write FP code “for real” but what I learned is that it’s OK to test floats for equality in very specific cases. For example, if a variable was initialized to some FP constant, later it would be fine to test for equality with that constant to see if the variable has been changed.
I believe I inadvertently directed the class towards an FP solution just due to how the spec was discussed in class, though we also discussed non-FP approaches (we didn’t discuss any approach in detail, though).
> I am open to the possibility that FP solutions might work if the math is done carefully, using either 64-bit or 128-bit floats.
I don’t know if long doubles on your target implementation have quad precision or Intel’s 80-bit format, but either way, the significand is at least 64 bits, and the integer solutions port over just fine.
I think that the more satisfyingly floating-pointy solutions could work in quad precision as long as the standard math library is good (I saw one recently that implemented fma(x,y,z) literally as x*y+z with no attempt to round correctly, so who knows what other buried treasure could be there) and great care is taken to compute appropriate tolerances.
When I read your posting describing this course my first thought was how can I enroll all my developers. It is too bad all students can’t have the benefit of this. I have been following their progress. As an embedded engineer I of course have all the same comments about floating point and why they even bother to calculate the angles at all but that’s not the point of the class. I looked at several of the test cases and noticed they pretty much covered the branches but none of the edge cases. I look forward to seeing this improve as the course progresses and their skills improve. Part of our testing of developer candidates is for them to read the specification for a function and then generate a minimal set of data for a unit test.
I have been following your discussion on compiler testing for a while. I have always thought compiler bugs (embedded micro compilers) were just a given. I waste at least 2 weeks a year chasing what turns out to be compiler bugs. Perhaps with better testing tools this will become a thing of the past.
Writing solid code is more art than science and I am really looking forward for your course, would be glad if you can also share things here on this blog.
> Surely it doesn’t surprise you that someone with less experience with you might interpret this as a non-functional requirement, does it?
If i were to solve such a beastly task i would first check all math required (on paper). Then i’d try to simplify them, and once its done the decision to go integer is – for me – quite plain. Honestly i dont know, if its because of my experience or just style of thinking, which usually tells me: authorities my ass.
I’ve used FP maths several times in a past and always to weird cases and strange debug fails, because suddenly i’ve got negative almost zero values, instead of positive almost zero. Because somewhere else, just right before i’ve used square root and it rounded just tiny bit off in a wrong direction. Hence my stance: if You use anything in FP, but addition and multiplication by powers of 2, You’re going to have a bad time. This is perfectly visible, when You try to debug check some values and generate them in another (probably slower) way. And suddenly they’re always different. So which one is true? I like to think of FP values, as if there was some random factor or some uncertainty about it. If Your program will work, when You substitute x + rand(x * 0.000001) for every x, then go ahead, You will be probably fine.
Even as someone who writes (often exact) floating-point library code professionally, my first inclination would be to do this entirely in integer. As others have noted, the required primitives remain (comfortably, though not by much) inside the range representable by 64-bit ints.
That said, much of the fear-mongering about FP rounding in these comments is misguided. All of these tests can be performed exactly in FP too, with a bit of understanding of FP arithmetic and attention to detail (and don’t we want students to take the type to actually understand the primitives they are working with?). If the format of the input data were FP, I would probably stay in FP, doing a fast approximate test followed by a slower exact computation if the result of the fast test was smaller than the error bound.
The conversation seems to have mostly broken down into discussion of a specific problem… that’s less interesting from a strategy standpoint. One of the biggest problems I see when talking to people about code and testing is that people, when asked to write tests, often shoot for code coverage rather than case coverage. This is especially true of recent graduates, or people who’ve been working but not necessarily testing. Sometimes I’ll ask people to solve a basic problem with some constraints. Then I’ll ask how they’d test it and they’ll walk through their code trying to come up with cases that hit every branch. The problem is, I already know of cases where they don’t handle the input, or handle it improperly. They end up with 100% code coverage and 25% coverage of valid classes of input.
It may be worth while to focus on writing tests without looking at or thinking about the implementation. Possibly covering TDD, though maybe just because if they haven’t written the code yet, they won’t be constructing tests based on the code they’ve written. You could even do some adversarial work – split the class up and give them different problems, but along with code for their problem, they need to write tests for the other group. (the smart ones will also write tests for their own code to help know that it’s working 🙂 )
Re “code coverage” vs. “case coverage”:
float sin(float x) { return x; }
bool test_sin(void) { return sin(0.f) != reference_sin(0.f); }
It passes all tests with 100% code coverage. It must be correct.
Stephen: I’ll use that example in class today.