Trust Boundaries in Software Systems

One of the big things that has changed in computer science education over the last 20 years is that it is now mandatory to prepare students for writing software that lives in a hostile environment. This content can’t be limited to a computer security course, it has to be spread throughout the curriculum. My experience, based on talking to people, looking through textbooks, and looking at lecture material on the web, is that overall we’re not yet doing a great job at this.

When teaching this subject, I’ve started using trust boundaries as an organizing principle. A trust boundary exists any time we (the system designers or system owners) trust code, data, or human actors on one side of an interface more than we trust the other side of the interface. Students need to be able to recognize, understand, fortify, and stress-test the trust boundaries in any system they have a stake in.

Trust boundaries aren’t hard to find: We just need to ask questions like “What are the consequences if this code/data became horribly malicious? Is that likely? Can we defend against it? Do we want to defend against it?” It is easy to conclude, for example, that a demonic garbage collector or OS kernel might not be something that we wish to defend against, but that we had better fortify our systems against toxic PNG files that we load from random web sites.

Some basic observations about trust boundaries:

  1. They’re everywhere, even inside code written by a single person. Anytime I put an assertion into my code, it’s a tacit acknowledgment that I don’t have complete trust that the property being asserted actually holds.
  2. The seriousness of trust boundaries varies greatly, from mild mistrust within a software library all the way to major safety issues where a power plant connects to the internet.
  3. They change over time: a lot of our security woes stem from trust boundaries becoming more serious than they had been in the past. Email was not designed for security. The NSA wasn’t ready for Snowden. Embedded control systems weren’t intended to be networked. Libraries for decoding images, movies, and other compressed file formats that were developed in the 90s were not ready for the kinds of creative exploits that they faced later on.
  4. If you fail to recognize and properly fortify an important trust boundary, it is very likely that someone else will recognize it and then exploit it.

To deal with trust boundaries, we have all the usual techniques and organizing principles: input sanitization, defense in depth, sandboxing, secure authentication, least privilege, etc. The issue that I’m trying to respond to with this post is that, in my experience, it doesn’t really work to hand students these tools without some sort of framework they can use to help figure out where and when to deploy the different defenses. I’d be interested to hear how other CS instructors are dealing with these issues.

Stories Behind Papers: Integer Overflow

A couple months ago Jean Yang and Vijay Chidambaram had a Twitter discussion about the stories behind research efforts that you might hear over coffee, but that usually don’t get written up. Vijay started a series of posts containing these. I thought I’d write up a couple of them myself. Alas, none will be particularly dramatic. This seems like a good one to start with.

Around the mid/late 2000s — perhaps starting with Nearly All Binary Searches and Mergesorts are Broken — I got interested in integer overflow bugs. At this point the security aspect of integer bugs in C and C++ was receiving plenty of attention, but I didn’t feel like very many people were looking at the broader issue of logic errors stemming from integer overflows. Even in functional languages with super-serious type systems and a focus on correctness, integer overflow was (and is) an often-neglected issue. These problems are fundamentally difficult to deal with at compile time.

Anyhow, the thing that really got me motivated was the very limited understanding of undefined behavior that seemed to be par for the course in those days. Additionally, most of the existing research tools for detecting or mitigating integer overflows were operating on compiled code, while I believed this problem needed to be attacked near the source level.

By summer 2010 my student Peng Li (now at Baidu USA) had a modified version of Clang that emitted dynamic checks for integer overflow, divide by zero, value-losing typecasts, shifts past bitwidth, and that kind of thing into a compiled C or C++ program. We used this to test open source software and it turned out that basically all programs were executing a constant stream of undefined behavior. I fired off a few dozen bug reports. Since UB wasn’t widely understood at that time, many developers still had the attitude “that is OK since we did it intentionally” or “I am allowed to expect signed overflow in C/C++ to have two’s complement behavior because that is what the hardware does.” See for example the discussions that happened at PostgreSQL and PHP.

In early 2011 I visited Grigore Rosu’s group at UIUC to learn about their awesome new KCC tool. We needed something like this to filter out undefined programs that C-Reduce was creating while making minimal versions of bug-triggering programs from Csmith. During this visit I happened to be able to grab a few minutes with Vikram Adve and learned that he and his student Will Dietz were also working on LLVM-based integer overflow detection, and they also had a working prototype. Yikes! This wasn’t even close to the worst case scenario — which would have been learning about their work once they had a paper accepted somewhere — but it’s never fun to be scooped. Competition in research may occasionally be useful as a forcing function, but I am personally uninterested in competing. If smart people are working on a problem, I would like to leave them to it, they’ll most likely come up with a good solution. There are way too many other fun and important problems to work on for competing on any single problem to be attractive. Anyhow, luckily, Vikram and Will were happy to collaborate, so we started working together. I’m still happy with the resulting paper and I’m confident that it is better than what either of the groups would have produced working on its own.

One of our goals all along was to get integer overflow checks into Clang. This took a while longer and Will did most of the legwork. The job was made easier by the fact that by this time there was plenty of momentum towards dynamic undefined behavior detection in LLVM. For example, ASan was already part of the tree. There was an existing -fcatch-undefined-behavior flag that we fit into, but this pretty rapidly (in time for LLVM 3.3, I believe) got phased out in favor of the -fsanitize=undefined usage that Clang still uses.

Overall, dynamic detection of integer-related undefined behaviors in C/C++ is not difficult, but convincing people to take these bugs seriously was a long struggle and the broader issue of how integer overflows relate to program bugs is interesting and deep. Fundamentally, people usually do not want, and are not good at reasoning about, fixed-width integers, but on the other hand we don’t want to just put bignums everywhere in our low-level programming languages. The other thing I take away from this effort is how a lucky break and a willingness to collaborate were really important in making the work successful, and in fact my group continues to collaborate with Vikram’s.

The Real Problem with the US News Rankings

The latest list of Best Global Universities for Computer Science from US News has not been well received. For example, the Computing Research Association issued a statement saying that “Anyone with knowledge of CS research will see these rankings for what they are — nonsense — and ignore them. But others may be seriously misled.” The CRA statement identifies these problems with the US News rankings:

  • They ignore conference publications (many areas of CS publish primarily in conferences).
  • US News doesn’t even say which venues are used to compute the publication-based part of the ranking function.
  • The reputation-based part of the rankings doesn’t make much sense given the diverse, global nature of the computer science research community.

An additional problem is that it seems to be pretty easy to game this ranking system using money. For example, King Abdulaziz University (Jeddah, Saudi Arabia) has adopted hiring practices that appear to be designed to do this. Their CS department is ranked #13, compared for example to CMU at #22 and Illinois at #46. I’m trying to avoid being USA-centric and elitist here, but based on some web searches, it is just not possible to objectively rate the CS department at KAU as being better than the one at CMU. US News explains their ranking methodology here.

To summarize, US News is designed to make money, not to do the CS community any favors. Universities are going to try to maximize their rankings. It’s a pretty banal situation all around.

What I wanted to talk about today is the function of rankings. What are we supposed to do with them? The conclusion I’ve come to is that a closed, opaque ranking such as the one from US News is only good for one thing: codifying and reinforcing a pecking order so that it can be used by people who don’t need or want any more information than a total ordering. This might include, for example, university administrators who would like to know if sending additional resources to a department resulted in a measurable and externally-visible improvement.

The reason everyone’s annoyed with US News is that they’ve upended the established pecking order. But here’s the thing: they could fix this tomorrow and their opaque rankings would still be worthless for people who care about what’s behind the rankings, as opposed to being interested in ranking for its own sake. There has to be a better way.

In contrast, let’s take a look at CSRankings, a site that Emery Berger put together using publicly available data from DBLP. This ranking assigns credit to departments based on the number of top-tier papers published by their full-time faculty, credit for which is split among authors. There’s a FAQ giving additional details. (There’s a lot of quibbling that could be done about how this all works; I’m not too interested in that.)

The thing that makes this ranking different for practical purposes isn’t the openness of the algorithm and the data set, but rather the way the web site allows us to explore the data. Let’s say that I’m a prospective graduate student interested in operating systems and formal verification. The first thing I can do is select only those areas — now the site shows me the departments that have people who tend to publish heavily in conferences such as SOSP and CAV. Second, I can click on an individual department and see who the key players are in those areas. Third, I can go to these people’s home pages, Google Scholar pages, etc. in order to see what they are specifically doing, and finally I can read their code and papers. I would argue that this is a fundamentally different use of a ranking system: the purpose is to guide me towards details that matter, not to hide everything behind a number.

In summary, I find the complaints about the US News rankings to be a bit off the mark, since even a fixed version of them will provide no insights and no information beyond an opaque ordering. It would just be confirming the status quo instead of refuting it, as their current rankings do. That is what some people want, but it is of little use to faculty and students in the field. A better use of rankings is to serve as a guide for further exploration — for this to happen, the rankings need to be open and connected to more detailed sources of information. CSRankings accomplishes this and it is the tool you should use to explore the productivity of computer science departments. If you don’t like it, you can try to convince Emery to do things differently or else create your own ranking.

The Dreaded Practice Talk

[I wrote a post with the same title in 2010; this is an updated version.]

In a week you’ll be giving a talk about your work to 600 people at a conference, or perhaps to five people who will sign off (or not) on your thesis. Depending on your area and the type of talk, the questions following the talk may not be very friendly. What should you do? Practice, practice, practice.

A practice talk is usually given to a small audience anywhere between a few weeks and a few hours before an important talk. It is followed by a feedback session that can easily last five times longer than the talk itself did. Often, multiple practice talks are necessary before the presentation becomes really polished and good.

This post is about getting maximum benefit from a practice talk — this is important because they are very time consuming.

The speaker needs to:

  • Have a legible slide number on every slide. If these aren’t there, people taking notes can’t easily refer back to specific slides later on.
  • Reserve a room, acquire a projector, and have everything setup and ready to go at the arranged time. Have all of the adapter dongles that you need on hand. If anyone is calling in remotely, this should also be taken care of by the speaker or by someone who has agreed to help the speaker, and it needs to be done before the talk is scheduled to start.
  • Have practiced the talk alone first. It helps to have memorized what to say when transitioning between slides. Memorizing an entire talk is usually overkill. Focus on transitions and on getting the talk started smoothly; most of us have a much easier time continuing to talk about a topic than getting started.
  • Have an appropriate number of slides. Speakers vary widely in terms of delivery speed and amount of content per slide, but 1.5 to 2 minutes per slide is probably about right. In realistic situations you will be cut off if you exceed your time budget. At proposals and defenses there is usually not a strict time budget, but going over time is strongly frowned upon.
  • Have a pen and paper available to take notes after the talk. You cannot remember 150 detailed suggestions about things to change.
  • Arrange for someone to time the talk. Sometimes it is helpful to get timings on individual slides.
  • Act on the feedback that is given.

Each member of the audience must:

  • Listen to the talk as if it were being given for real. Interrupting the speaker should be handled according to whatever protocol will be in force during the real talk. Generally this means few or no interruptions.
  • Arrive with a pen and paper, or equivalent note-taking gear.
  • Provide detailed feedback in a constructive and respectful fashion.

In my group this is usually the procedure:

  1. I give a bit of context: remind everyone what the speaker needs to accomplish, what kind of background and temperament the audience is likely to have, etc.
  2. I introduce the speaker.
  3. The talk is given, minimizing interruptions to get a good timing estimate.
  4. Starting with students, the audience asks questions as if they had just heard the real version of the talk. The speaker responds accordingly.
  5. Starting with students, the audience makes general comments about the delivery of the talk.
  6. We go through the talk slide by slide, giving feedback and trying to figure out what to add, delete, change around, etc.

Finally, a bit of advice on making slides:

  • Don’t put text too close to the edges of slides; some projection systems crop a bit.
  • Colors often look different when they go through a projector, and low-contrast colors can be completely invisible on a screen. Use a small number of very high-contrast colors. I typically use black on white for almost everything with some bright red or blue for emphasis.
  • Minimize the number of animations.

How Getting Tenure Is Supposed to Work

The other day Geoff Challen posted a blog entry about his negative tenure vote. Having spent roughly equal time on the getting-tenure and having-tenure sides of the table, I wanted to comment on the process a little. Before going any further I want to clarify that:

  • I know Geoff, but not well
  • I wasn’t involved in his case in any capacity, for example by writing a letter of support
  • I have no knowledge of his tenure case beyond what was written up in the post

Speaking very roughly, we can divide tenure cases into four quadrants. First, the professor is doing well and the tenure case is successful — obviously this is what everybody wants, and in general both sides work hard to make it happen. Second, the professor is not doing well (not publishing at all, for example) and the tenure case is unsuccessful. While this is hugely undesirable, at least the system is working as designed. Third, the professor is not doing well and the tenure case is successful — this happens, but very rarely and usually in bizarre circumstances, for example where the university administration overrules a department’s decision. Finally, we can have a candidate who is doing well and then is denied tenure. This represents a serious failure of the system. Is this what happened to Geoff? It’s hard to be sure but his academic record looks to me like a strong one for someone at his career stage. But keep in mind that it is (legally) impossible for the people directly involved in Geoff’s case to comment on it, so we are never going to hear the other side of this particular story.

So now let’s talk about how tenure is supposed to work. There are a few basic principles (I suspect they apply perfectly well to performance evaluations in industry too). First, the expectations must be made clear. Generally, every institution has a written document stating the requirements for tenure, and if a department deviates from them, decisions they make can probably be successfully appealed. Here are the rules at my university. Junior faculty need to look up the equivalent rules at their institution and read them, but of course the university-level regulations miss out on department-specific details such as what exactly constitutes good progress. It is the senior faculty’s job to make this clear to junior faculty via mentoring and via informal faculty evaluations that lead up to the formal ones.

If you look at the rules for tenure at Utah, you can see that we’re not allowed to deny tenure just because we think someone is a jerk. On the other hand, there is perhaps some wiggle room implied in this wording: “In carrying out their duties in teaching, research/other creative activity and service, faculty members are expected to demonstrate the ability and willingness to perform as responsible members of the faculty.” I’m not sure what else to say about this aspect of the process: tenure isn’t a club for people we like, but on the other hand the faculty has to operate together as an effective team over an extended period of time.

The second principle is that the tenure decision should not be a surprise. There has to be ongoing feedback and dialog between the senior faculty and the untenured faculty. At my institution, for example, we review every tenure track professor every year, and each such evaluation results in a written report. These reports discuss the candidate’s academic record and provide frank evaluations of strengths and weaknesses in the areas of research, teaching, and service (internal and external). The chair discusses the report with each tenure-track faculty member each year. The candidate has the opportunity to correct factual errors in the report. In the third and sixth years of a candidate’s faculty career, instead of producing an informal report (that stays within the department), we produce a formal report that goes up to the university administration, along with copies of all previous reports. The sixth-year formal evaluation is the one that includes our recommendation to tenure (or not) the candidate.

A useful thing about these annual evaluations is that they provide continuity: the reports don’t just go from saying glowing things about someone in the fifth year to throwing them under the bus in the sixth. If there are problems with a case, this is made clear to the candidate as early as possible, allowing the candidate, the candidate’s mentor(s), and the department chair to try to figure out what is going wrong and fix it. For example, a struggling candidate might be given a teaching break.

Another thing to keep in mind is that there is quite a bit of scrutiny and oversight in the tenure process. If a department does make a recommendation that looks bad, a different level of the university can overrule it. I’ve heard of cases where a department (not mine!) tried to tenure a research star who was a very poor teacher, but the dean shot down the case.

If you read the Hacker News comments, you would probably come to the conclusion that tenure decisions are made capriciously in dimly lit rooms by people smoking cigars. And it is true that, looking from the outside, the process has very little transparency. The point of this piece is that internally, there is (or should be) quite a bit of transparency and also a sane, well-regulated process with plenty of checks and balances. Mistakes and abuses happen, but they are the exception and not the rule.

Phil Guo, Sam Tobin-Hochstadt, and Suresh Venkatasubramanian gave me a bit of feedback on this piece but as always any blunders are mine. Sam pointed me to The Veil, a good piece about tenure.

Vigorous Public Debates in Academic Computer Science

The other day a non-CS friend remarked to me that since computer science is a quantitative, technical discipline, most issues probably have an obvious objective truth. Of course this is not at all the case, and it is not uncommon to find major disagreements even when all parties are apparently reasonable and acting in good faith. Sometimes these disagreements spill over into the public space.

The purpose of this post is to list a collection of public debates in academic computer science where there is genuine and heartfelt disagreement among intelligent and accomplished researchers. I sometimes assign these as reading in class: they are a valuable resource for a couple of reasons. First, they show an important part of science that often gets swept under the rug. Second, they put discussions out into the open where they are widely accessible. In contrast, I’ve heard of papers that are known to be worthless by all of the experts in the area, but only privately — and this private knowledge is of no help to outsiders who might be led astray by the bad research. For whatever reasons (see this tweet by Brendan Dolan-Gavitt) the culture in CS does not seem to encourage retracting papers.

I’d like to fill any holes in this list, please leave a comment if you know of a debate that I’ve left out!

Here are some more debates pointed out by readers:

Latency Numbers Every Professor Should Know

### Latency numbers every professor should know
    Email from student ............................ 20 sec
    Person at office door  ......................... 8 min
    Other interruption ............................ 20 min
    Twitter or something seems really important ... 45 min
    Anxiety about deadlines ........................ 1 hr
    A meeting ...................................... 2 hrs
    A meeting you forgot about ..................... 1 day
    A class to teach ............................... 2 days
    Request to review a paper ...................... 3 days
    Request to write evaluation letter ............. 6 days
    Stuff to grade ................................. 1 wk
    Unsolicited time management advice arrives ..... 2 wks
    Fire alarm clears building ..................... 3 wks
    Travel to conference ........................... 5 wks
    Paper deadline ................................. 6 wks
    Grades due .................................... 16 wks
    Grant proposals due ........................... 26 wks
    Summer ......................................... 1 yr
    Sabbatical ..................................... 7 yrs = 2.208e+17 ns

With apologies to the folks who published latency numbers every programmer should know.