A Conversation about Teaching Software Engineering

For better or worse, my impressions of software engineering as a field were shaped by a course I took as an undergrad that I thought was mostly not very interesting or useful. We spent a lot of time on waterfalls and stuff, while not covering testing in any detail. For the final project in the class we had to develop an application using a CASE tool (very hip at the time) where we described the class hierarchy using a GUI and then the tool generated skeletal C++ for us to fill in. Since we knew nothing about designing class hierarchies and also the tool was weird and buggy this all went about as disastrously as you would expect. In the end I learned quite a lot, but the lessons were probably not those intended by the instructor.

24 years later I’m teaching a software engineering class — this probably wouldn’t even happen if my department had any real software engineering faculty! Even so, I’m a true believer: I love the material and feel more strongly about its importance than I do about my more usual subjects like compilers and operating systems. I ignore software process and focus entirely on building skills and habits that I feel will come in handy in any software engineer’s career. If you like it, put a test on it. Read code. Review code. Refactor code. Write assertions. Adhere to coding standards. Design an API. Use a coverage tool, a bug-finding tool, a version control tool, a fuzz tool, a CI tool, to good effect. Repeat until end of semester. No doubt there’s room for improvement but the material seems solid.

Over Christmas break I had a beer with Daniel Dunbar, who I should have met long before, but somehow hadn’t. Daniel has done super impressive stuff: he was one of the original Klee authors and also was an early Clang implementer. I told him about my approach to teaching software engineering and picked his brain a bit about the sorts of things he wished people with CS degrees were better at doing. Of course I wasn’t taking notes and forgot most of it. So I mailed:

As I mentioned the other week, I’m teaching a software engineering course this semester, and rather than focusing on any kind of academic approach to this subject, I’ll try to teach them a lot of the real world business of making software that works, starting with all the basics like testing, coverage, assertions, and code reviews.

But you also mentioned some things that I agreed with but that I might not have thought of, or prioritized — the things that you wish people were already good at when you interview them, but they probably aren’t. Is there any chance I could get you to very briefly summarize those things, or point me to a resource you like on this subject? I want to make sure to cover, at least quickly, all of the main high points in this class.

Daniel graciously sent a long reply and also allowed me to reproduce it here:

To start with, I don’t think I know of any resources on the subject. It’s amazing how much obvious little stuff one has to know, but forgets about. Git is a huge source of “things I use every day but forgot I had to learn”, for better or worse.

I guess if I had to come up with a list off the cuff:

The experience of maintaining software over time. I think we spend most of our time working with existing code bases and figuring out how to integrate changes into them. This area has a lot of related topics:
- How do you figure out where to make a change? Tools here include debugging existing workflows to find where something happens, code search, git blame, git grep, git log -G, etc
- How do you manage making incremental changes? I am a huge believer in always doing incremental work. How do you build a feature while always keeping the code working? Tools here include feature flags, adaptors and stub implementations, forwarding implementations, A/B testing before/after change.
- How do you find the source of regressions? Tools here are basically bisection, git log -G.
- How do you handle technical debt? What counts as technical debt? What kinds of debt are painful versus not?
I feel like I probably have read things I liked on these topics, but none of the links are coming to mind now.
The experience of making technical decisions. This is a *huge* part of development. Topics here:
- How do you evaluate choices for a dependency? Tools here include benchmarking, analysis of the code, analysis of the software maintainability, etc
- How do you evaluate when to adopt a dependency versus write your own? Topics here are NIH versus opportunity cost on innovation.
- How do you convince people to follow a particular choice? Presenting or writing coherent write-ups on engineering tradeoffs is a really under appreciated skill which can have a big impact (the cost of bad decisions are high).
How do you debug things? Another big part of development. I would emphasize debug here not just in the “how do I get this working” but even deeper in the “how do I understand what is really happening?”
- I find that many people tend to give up at really understanding what is going on. “Oh XYZ didn’t work, so I did PDQ’”. The people who don’t give up usually end up understanding a lot more about computers, and then do a better job maturing over their career.
- Maybe it would be good to teach people (a) don’t give up, and (b) here are all the tools you can use when you might want to give up. A lot of the time the tool people know is stackoverflow, but past that they are lost.
- Things here include hardware watchpoints, reading the source code, disassembly and reverse engineering.
How do you estimate the time to develop software? This is a huge part of a business, people will always want you to do this. Even just getting students to start to think about the process would be good, asking them to make estimates and compare results to them.
- I have no advice on how to teach this because I still learning a lot here.
How do you review code? What makes good review?
- When is coding style important versus not? What are the pros/cons?
- Does review catch bugs, or not? Are certain review styles more effective?
How do people do release management? This is such an amazingly huge part of what we spend time on, and one that receives very little attention.
- Do you release from trunk? If so, how do you ensure quality?
- Do you have stable release branches? If so, how do you ensure bugs are actually fixed? Are people cherry picking fixes? What can go wrong? (I remember once cherry picking a fix that happened to merge incorrectly, but the patch applied an in a way that was still valid C++ code (an if {} clause ended up inside another one). The result was a clang that miscompiled itself.)
- How do you deal with complicated merge conflicts?
- People like Nicole Forsgren have research here which it would at least be nice for people to be exposed to.

If I had to come up with the kinds of potential exercises I would love it if someone was trying (no idea if they actually would work):

Take a bug in a complex code base at some revision, and ask people to find it and fix it. Compare answers to the one the project actually adopted. A bug where there were several obvious fixes with tradeoffs would be a good point for discussion.
Take a new feature which was added to some project, and study how it was done. Not to toot my own horn–its just an example I am familiar with–we migrated Clang to go from producing .s files to doing the assembly in memory. This involved lots of incremental refactoring, a clear switch over at one point (feature flag), A/B testing to compare old to new (i.e. we chose to shoot for binary equivalence of .o files, simply so we could easily test it — made for lots of extra engineering, but easier to guarantee correct). One could dig up how this went from a theoretical concept on a mailing list to an implemented feature.
The best thing here would be to create a hypothetical project which is a mirror of a real project (but don’t make this clear at first) and ask people to design some extension of it. Then, compare the results to what the project actually decided to do, and the discussion around it. For example, analyze what things the project owners antagonized over that the students didn’t think of, and try and figure out why not.
Do some systematic study of PRs as a “literature” exercise. How does tone impact response, how do people handle criticism, etc.
Do an exercise where teams are forced to produce a project over the lifetime of the course. The exercises should be small and easier, but they have to be built into the same code base. Something that forces people to make software releases would be nice (dunno if there is a way to do this in such a way that people can choose whether or not to use “release branches”, but in a way that has tradeoffs in either direction).

Wow, this is a lot of material, way more than we cover in depth in a semester. Even so, I find it super useful as a high-level vision for the kinds of things students should end up at least having been exposed to.

January 23, 2018

regehr

Computer Science, Education, Software Correctness

16 responses to “A Conversation about Teaching Software Engineering”

Wyatt Epp says:

January 23, 2018 at 5:28 pm

I’ve said this a lot over the last decade, but the major skills CS grads tend to lack are just so very basic, it’s astounding: using source control, using other people’s libraries, having ever heard of a debugger at all… it looks like you’re doing right by the subject by covering those things.

Another one that I’ve since realised is probably just as important is… how do you call it… basic code fluency? Basically, the idea that you shouldn’t worry too much about languages because languages are cheap and most code looks pretty similar in most of them anyway. At the same time, you need to be able to engage with “foreign” languages and learn how to read code and find your bearings in a project relatively quickly.
regehr says:

January 23, 2018 at 8:10 pm

Totally agree re. language fluency! I don’t try to force this in class but rather sort of mention repeatedly to students that this is important on the hope that they start to buy in.
Geoff Wozniak says:

January 23, 2018 at 8:48 pm

I heartily endorse the notion of making students find bugs and fix them in a sizable project. It’s a surefire way to get some experience with tools and code navigation. More importantly, it’s an opportunity for students to get acquainted with basic problem solving skills in a practical setting.

I, too, don’t remember much of anything from my software engineering courses, except for one key aspect: learning to work in groups. Any formalisms (SDLC, UML) and tools at the time (Rational , Tivoli *shudder*) I’ve long since forgotten and never applied outside of the course. Except for learning to navigate group dynamics. Boy howdy is that important, because you’re always working with other people on a software project.

And because tools change over time, I would try to decouple the tool used from the task performed. Languages fall into the same pattern. (I like the phrasing of “language fluency”.) It bothers the hell out of me when people see “version control” and assume “Git”. This is how unnecessary rewrites take place: devs love their tools too much.
regehr says:

January 23, 2018 at 9:25 pm

I’m worried about the students getting kind of lost in a big code base — the ones I have this semester aren’t super experienced. Any ideas how to bring this sort of exercise under control without losing the good parts? I agree it’d be super useful.

I feel like tool specificity is just hard to avoid at first. A tool-independent perspective is something that we develop over time…
Alex D Groce says:

January 23, 2018 at 11:21 pm

When I taught the first SE course at OSU (two term sequence, where term two was “my” course on testing and maintenance) I hammered that schedules are probability distributions, not dates, and had them do a lot of actual reading — a lot of code, but also things like Butler Lampson, Parnas, Brooks, some of the early architecture stuff, Knight/Leveson on N-version, and so forth.

I think SE probably IS big enough to make a good two-term sequence.
Alex D Groce says:

January 23, 2018 at 11:26 pm

My undergrad SE course was also a horrid “here’s some boring powerpoint, waterfall blah blah blah” where the main project was just to write some fairly lame program in Java (which none of us knew, so that was at least fun) in teams. Teams of two, where you chose your partner (so I worked with my apartment-mate). Did anyone have a GOOD undergrad SE experience?
Alaric Snell-Pym says:

January 24, 2018 at 6:46 am

Technical debt is an interesting topic. I came across a group who are trying to quantify (some kinds of) technical debt, with some interesting success: https://technical-debt.net/home/ – computing their MacCormack complexity metrics on a large codebase I was involved with, well, confirmed my suspicions, which is a good sign when you’re comparing a metric to an intuitive hunch!
Geoff Wozniak says:

January 24, 2018 at 7:08 am

I wouldn’t recommend a big code base either. Something on the order of a couple thousand lines, maybe even pushing 10000 lines of C. Perhaps a working on an existing testing framework is worth considering. Students can bootstrap their tests. 😉

And yes, tool specificity is hard to avoid: you have to start somewhere! I’d just remind them that there’s more than one way to do it. (I feel bad invoking a Perl-ism, but it’s true…)
regehr says:

January 24, 2018 at 9:19 am

Alex, yeah, I’d agree that this sort of course should definitely be a full year, especially since it can be used to reinforce ideas from many other parts of the curriculum.

Thanks for the link, Alaric, I hadn’t seen this.
regehr says:

January 24, 2018 at 9:19 am

Geoff, let me know if you have specific suggested code bases that might be fun.
Alex D Groce says:

January 24, 2018 at 12:01 pm

Do you guys (Utah) have a separate capstone where students build a real system for some customer? I tend towards SE as:

1) design/architecture/scheduling/basic principles/light testing (a class with some lighter coding and such, not a total project class)

2) maintenance/debugging/heavy testing (a class with tons of actual maintaining a not-yours ugly codebase, and reading/fixing/loathing code)

3) a capstone where you build something large and complex

I differ from many in that I’d incline to have 1 and 2 be much less team oriented than the usual CS program. Yeah, everyone builds stuff in teams and you should be able to talk on slack, but teams still often involve “you, go make this part” work, esp. maintenance and testing.
regehr says:

January 24, 2018 at 3:01 pm

Alex, we have a two-semester capstone where they do a largish senior project. That’s about all the SE we have unless you count verification course or else I teach a one off “writing solid code” which fits your #2. Like I said in the post, we lack mainline SE faculty.

My SE class this spring is for a new professional MS program, it’s pretty fun so far!
Geoff Wozniak says:

January 24, 2018 at 6:32 pm

A couple of ideas off the top of my head.

AceUnit is a C testing framework that I took a look at and didn’t go with, but it’s reasonably small and might be interesting to add some small enhancements to. Maybe something like colourized formatting. This would require a new option, and maybe some internal restructuring. There are also a few “TODO”s and “XXX: hack”s in the source, so maybe they could try to remove them. It’s on SourceForge, though, so that could be a turn-off. (Catch is a good C unit testing framework I wanted to recommend, but it’s mature and doesn’t need much done to it that’s reasonable for student projects.)

I hesitate to mention this one, but for something advanced, maybe some of the GNU binutils. Yeah, the code is pretty nasty, but if you limit yourself to readelf or objdump, it’s digestable (just stay away from the assembler and linker). I don’t know what kind of changes they could make though. (Port it to CMake? 😉 )

Now that I look at those I’m not terribly happy with them, but I think they represent the ends of the spectrum of projects I’m getting at.
regehr says:

January 24, 2018 at 10:22 pm

Thanks Geoff!
Dirkjan Ochtman says:

January 28, 2018 at 1:35 pm

In the Mozilla world, you might be interested in looking up the work of or talking to Dave Humphrey, who has had his students at Seneca Collega work on Mozilla projects. Because Dave has a good connection to Mozilla Corporation, he is able to get support from the Mozilla community to support his students in their projects.

https://cs.senecac.on.ca/~david.humphrey/
regehr says:

February 21, 2018 at 9:57 am

Thanks Dirkjan!