How Getting Tenure Is Supposed to Work

The other day Geoff Challen posted a blog entry about his negative tenure vote. Having spent roughly equal time on the getting-tenure and having-tenure sides of the table, I wanted to comment on the process a little. Before going any further I want to clarify that:

  • I know Geoff, but not well
  • I wasn’t involved in his case in any capacity, for example by writing a letter of support
  • I have no knowledge of his tenure case beyond what was written up in the post

Speaking very roughly, we can divide tenure cases into four quadrants. First, the professor is doing well and the tenure case is successful — obviously this is what everybody wants, and in general both sides work hard to make it happen. Second, the professor is not doing well (not publishing at all, for example) and the tenure case is unsuccessful. While this is hugely undesirable, at least the system is working as designed. Third, the professor is not doing well and the tenure case is successful — this happens, but very rarely and usually in bizarre circumstances, for example where the university administration overrules a department’s decision. Finally, we can have a candidate who is doing well and then is denied tenure. This represents a serious failure of the system. Is this what happened to Geoff? It’s hard to be sure but his academic record looks to me like a strong one for someone at his career stage. But keep in mind that it is (legally) impossible for the people directly involved in Geoff’s case to comment on it, so we are never going to hear the other side of this particular story.

So now let’s talk about how tenure is supposed to work. There are a few basic principles (I suspect they apply perfectly well to performance evaluations in industry too). First, the expectations must be made clear. Generally, every institution has a written document stating the requirements for tenure, and if a department deviates from them, decisions they make can probably be successfully appealed. Here are the rules at my university. Junior faculty need to look up the equivalent rules at their institution and read them, but of course the university-level regulations miss out on department-specific details such as what exactly constitutes good progress. It is the senior faculty’s job to make this clear to junior faculty via mentoring and via informal faculty evaluations that lead up to the formal ones.

If you look at the rules for tenure at Utah, you can see that we’re not allowed to deny tenure just because we think someone is a jerk. On the other hand, there is perhaps some wiggle room implied in this wording: “In carrying out their duties in teaching, research/other creative activity and service, faculty members are expected to demonstrate the ability and willingness to perform as responsible members of the faculty.” I’m not sure what else to say about this aspect of the process: tenure isn’t a club for people we like, but on the other hand the faculty has to operate together as an effective team over an extended period of time.

The second principle is that the tenure decision should not be a surprise. There has to be ongoing feedback and dialog between the senior faculty and the untenured faculty. At my institution, for example, we review every tenure track professor every year, and each such evaluation results in a written report. These reports discuss the candidate’s academic record and provide frank evaluations of strengths and weaknesses in the areas of research, teaching, and service (internal and external). The chair discusses the report with each tenure-track faculty member each year. The candidate has the opportunity to correct factual errors in the report. In the third and sixth years of a candidate’s faculty career, instead of producing an informal report (that stays within the department), we produce a formal report that goes up to the university administration, along with copies of all previous reports. The sixth-year formal evaluation is the one that includes our recommendation to tenure (or not) the candidate.

A useful thing about these annual evaluations is that they provide continuity: the reports don’t just go from saying glowing things about someone in the fifth year to throwing them under the bus in the sixth. If there are problems with a case, this is made clear to the candidate as early as possible, allowing the candidate, the candidate’s mentor(s), and the department chair to try to figure out what is going wrong and fix it. For example, a struggling candidate might be given a teaching break.

Another thing to keep in mind is that there is quite a bit of scrutiny and oversight in the tenure process. If a department does make a recommendation that looks bad, a different level of the university can overrule it. I’ve heard of cases where a department (not mine!) tried to tenure a research star who was a very poor teacher, but the dean shot down the case.

If you read the Hacker News comments, you would probably come to the conclusion that tenure decisions are made capriciously in dimly lit rooms by people smoking cigars. And it is true that, looking from the outside, the process has very little transparency. The point of this piece is that internally, there is (or should be) quite a bit of transparency and also a sane, well-regulated process with plenty of checks and balances. Mistakes and abuses happen, but they are the exception and not the rule.

Phil Guo, Sam Tobin-Hochstadt, and Suresh Venkatasubramanian gave me a bit of feedback on this piece but as always any blunders are mine. Sam pointed me to The Veil, a good piece about tenure.

Vigorous Public Debates in Academic Computer Science

The other day a non-CS friend remarked to me that since computer science is a quantitative, technical discipline, most issues probably have an obvious objective truth. Of course this is not at all the case, and it is not uncommon to find major disagreements even when all parties are apparently reasonable and acting in good faith. Sometimes these disagreements spill over into the public space.

The purpose of this post is to list a collection of public debates in academic computer science where there is genuine and heartfelt disagreement among intelligent and accomplished researchers. I sometimes assign these as reading in class: they are a valuable resource for a couple of reasons. First, they show an important part of science that often gets swept under the rug. Second, they put discussions out into the open where they are widely accessible. In contrast, I’ve heard of papers that are known to be worthless by all of the experts in the area, but only privately — and this private knowledge is of no help to outsiders who might be led astray by the bad research. For whatever reasons (see this tweet by Brendan Dolan-Gavitt) the culture in CS does not seem to encourage retracting papers.

I’d like to fill any holes in this list, please leave a comment if you know of a debate that I’ve left out!

Here are some more debates pointed out by readers:

Latency Numbers Every Professor Should Know

### Latency numbers every professor should know
    Email from student ............................ 20 sec
    Person at office door  ......................... 8 min
    Other interruption ............................ 20 min
    Twitter or something seems really important ... 45 min
    Anxiety about deadlines ........................ 1 hr
    A meeting ...................................... 2 hrs
    A meeting you forgot about ..................... 1 day
    A class to teach ............................... 2 days
    Request to review a paper ...................... 3 days
    Request to write evaluation letter ............. 6 days
    Stuff to grade ................................. 1 wk
    Unsolicited time management advice arrives ..... 2 wks
    Fire alarm clears building ..................... 3 wks
    Travel to conference ........................... 5 wks
    Paper deadline ................................. 6 wks
    Grades due .................................... 16 wks
    Grant proposals due ........................... 26 wks
    Summer ......................................... 1 yr
    Sabbatical ..................................... 7 yrs = 2.208e+17 ns

With apologies to the folks who published latency numbers every programmer should know.

Sabbatical at TrustInSoft

At the beginning of September I started at TrustInSoft, a Paris-based startup where I’ll be working for the next 10 months. I’ll post later about what I’m doing here, for now a bit about the company. TrustInSoft was founded by Pascal Cuoq, Fabrice Derepas, and Benjamin Monate: computer science researchers who (among others) created the Frama-C static analyzer for C code while working at CEA, the Atomic Energy Commission in France. They spun off a company whose business is guaranteeing the absence of undefined behavior bugs in C code, often deeply embedded software that needs to just work. Of course this is a mission that I believe in, and also I already know Pascal quite well and had worked with him.

The logistics of moving my family overseas for a year turned out to be more painful than I’d anticipated, but since we arrived it has been great. I’m super happy to be without a car, for example. Paris is amazing due to the density of bakeries and other shops, and a street right outside our apartment has a big open-air market three times a week. I’ve long enjoyed spending time in France and it has been fun to start brushing the dust off of my bad-but-sometimes-passable French, which was actually pretty good when I lived in Morocco in the 1980s. One of the funny things I’ve noticed is that even now, I have a much easier time understanding someone from North Africa than I do a random French person.

We spent August getting settled and being tourists, a few pics follow.

I know this bridge.

Hemingway talks about kids sailing these little boats in the Jardin du Luxembourg in the 1920s, I wonder how long it has been going on?

Thinnest building in Paris?

Dinner at one of the Ottolenghi restaurants in London with friends Edye and Alastair. I’ve been dreaming about eating at one of these for years!

Bletchley Park was super great to visit.

Small World and scotch is a good combination.

Back in Paris, one of the coolest art installations I’ve seen out at the Parc de la Villette.

I’m sort of obsessed with these spray-painted spiders that are all over the city.

Inexpensive CPU Monster

Rather than using the commercial cloud, my group tends to run day-to-day jobs on a tiny cluster of machines in my office and then to use Emulab when a serious amount of compute power is required. Recently I upgraded some nodes and thought I’d share the specs for the new machines on the off chance this will save others some time:

processor Intel Core i7-5820K $ 380
CPU cooler Cooler Master Hyper 212 EVO $ 35
mobo MSI X99S SLI Plus $ 180
RAM CORSAIR Vengeance LPX 16GB (4 x 4GB) DDR4 SDRAM 2400 $ 195
SSD SAMSUNG 850 Pro Series MZ-7KE256BW $ 140
video PNY Commercial Series VCG84DMS1D3SXPB-CG GeForce 8400 $ 49
case Corsair Carbide Series 200R $ 60
power supply Antec BP550 Plus 550W $ 60

These machines are well-balanced for the kind of work I do, obviously YMMV. The total cost is about $1100. These have been working very well with Ubuntu 14.04. They do a full build of LLVM in about 18 minutes, as opposed to 28 minutes for my previously-fastest machines that were based on the i7-4770. I’d be interested to hear where other research groups get their compute power — everything in the cloud? A mix of local and cloud resources? This is an area where I always feel like there’s plenty of room for improvement.

Inward vs. Outward Facing Research

One of the things I like to think about while watching research talks is whether the work faces inward or outward. Inward facing research is mostly concerned with itself. A paper that uses most of its length to prove a theorem would be an example, as would a paper about a new operating system that is mainly about the optimizations that permit the system to perform well. Outward facing research is less self-aware, it is more about how the piece of work fits into the world. For example, our mathematical paper could be made to face outwards by putting the proof into an appendix and instead discussing uses of the new result, or how it relates to previous work. The OS paper could demonstrate how users and applications will benefit from the new abstractions. Computer science tends to produce a mix of outward and inward facing research.

Next let’s turn to the question of whether a given paper or presentation should be inward or outward facing. This is subjective and contextual so we’ll do it using examples. First, the mathematical paper. If the proof is the central result and it gives us new insights into the problem, then of course all is as it should be. Similarly, if the operating system’s use case is obvious but the optimizations are not, and if performance is the critical concern, then again no problem. On the other hand, researchers have a tendency to face inward even when this is not justified. This is natural: we know more about our research’s internal workings than anyone else, we find it fascinating (or else we wouldn’t be doing it), we invent some new terminology and notation that we like and want to show off, etc. — in short, we get caught up in the internal issues that we spend most of our time thinking about. It becomes easy to lose track of which of these issues other people need to know about and which ones should have stayed in our research notebooks. Let’s say that we’re working on a new kind of higher-order term conflict analysis (just making this up, no offense to that community if it exists). One way to structure a paper about it would be to discuss the twists and turns we took while doing the work, to perform a detailed comparison of the five variants of the conflict analysis algorithm that we created, and to provide a proof that the analysis is sound. Alternatively, if the running time of the analysis isn’t actually that important, we could instead use some space demonstrating that a first-order analysis is wholly unsuitable for solving modern problems stemming from the big data revolution. Or, it might so happen that the analysis’s soundness is not the main concern, in which case we can use that space a better way.

I hope it is becoming clear that while some work is naturally inward facing and some outward facing, as researchers we can make choices about which direction our work faces. The point of this piece is that we should always at least consider making our work more outward facing. The cost would be that some of our inner research monologue never sees the light of day. The benefit is that perhaps we learn more about the world outside of our own work, helping others to understand its importance and helping ourselves choose more interesting and important problems to work on.

Reviewing Research Papers Efficiently

The conference system that we use in computer science guarantees that several times a year, each of us will need to review a lot of papers, sometimes more than 20, in a fairly short amount of time. In order to focus reviewing energy where it matters most, it helps to review efficiently. Here are some ideas on how to do that.

Significant efficiency can come from recognizing papers that are not deserving of a full review. A paper might fall into this category if it is:

  • way too long
  • obviously outside the scope of the conference
  • significantly incomplete, such as an experimental paper that lacks results
  • a duplicate of a paper submitted or published by the same or different authors
  • an aged resubmission of a many-times-rejected paper that, for example, has not been updated to reference any work done in the last 15 years

These papers can be marked as “reject” and the review then contains a brief, friendly explanation of the problem. If there is controversy about the paper it will be discussed, but the most common outcome is for each reviewer to independently reach the same conclusion, causing the paper to be dropped from consideration early. Certain program committee members actively bid on these papers in order to minimize their amount of reviewing work.

Every paper that passes the quick smoke test has to be read in its entirety. Or perhaps not… I usually skip the abstract of a paper while reviewing it (you would read the abstract when deciding whether or not to read the paper — but here that decision has already been made). Rather, I start out by reading the conclusion. This is helpful for a couple of reasons. First, the conclusion generally lacks the motivational part of the paper which can be superfluous when one is closely familiar with the research area. Second — and there’s no nice way to say this — I’ve found that authors are more truthful when writing conclusions than they are when writing introductions. Perhaps the problem is that the introduction is often written early on, in the hopeful phase of a research project. The conclusion, on the other hand, is generally written during the grim final days — or hours — of paper preparation when the machines have wound down to an idle and the graphs are all plotted. Also, I appreciate the tone of a conclusion, which usually includes some text like: “it has been shown that 41% of hoovulators can be subsumed by frambulators.” This gives us something specific to look for while reading the rest of the paper: evidence supporting that claim. In contrast, the introduction probably spends about a page waxing eloquent on the number of lives that are put at risk every day by the ad hoc and perhaps unsound nature of the hoovulator.

Alas, other than the abstract trick, there aren’t really any good shortcuts during the “reading the paper” phase of reviewing a paper. The next place to save time is on writing the review. The first way to do this is to keep good notes while reading, either in ink on the paper or in a text file. Generally, each such comment will turn into a sentence or two in the final review. Therefore, once you finish reading the paper, your main jobs are (1) to make up your mind about the recommendation and (2) to massage the notes into a legible and useful form. The second way to save time is to decide what kind of review you are writing. If the paper is strong then your review is a persuasive essay with the goal of getting the rest of the committee to accept it. In this case it is also useful to give detailed comments on the presentation: which graphs need to be tweaked, which sentences are awkward, etc. If the paper needs to be rejected, then the purpose of the review is to convince the committee of this and also to help the authors understand where they took a wrong turn. In this case, detailed feedback about the presentation is probably not that useful. Alternatively, many papers at top conferences seem to be a bit borderline, and in this case the job of the reviewer is to provide as much actionable advice as possible to the authors about how to improve the work — this will be useful regardless of whether the paper is accepted or rejected.

I hope it is clear that I am not trying to help reviewers spend less total time reviewing. Rather, by adopting efficient reviewing practices, we can spend our time where it matters most. My observation is that the amount of time that computer scientists spend writing paper reviews varies tremendously. Some people spend almost no time at all whereas others produce reviews that resemble novellas. The amazing people who produce these reviews should embarrass all of us into doing a better job.

Update: Also see Shriram’s excellent notes about reviewing papers.

A Guide to Better Scripty Code for Academics

[Suresh suggested that I write a piece about unit testing for scripty academic software, but the focus changed somewhat while I was writing it.]

Several kinds of software are produced at universities. At one extreme we have systems like Racket and ACL2 and HotCRP that are higher quality than most commercial software. Also see the ACM Software System Award winners (though not all of them came from academia). I wrote an earlier post about how hard it is to produce that kind of code.

This piece is about a different kind of code: the scripty stuff that supports research projects by running experiments, computing statistics, drawing graphs, and that sort of thing. Here are some common characteristics of this kind of code:

  • It is often written in several different programming languages; for example R for statistics, Matplotlib for pretty pictures, Perl for file processing, C/C++ for high performance, and plenty of shell scripts and makefiles to tie it all together. Code in different languages may interact through the filesystem and also it may interact directly.
  • It seldom has users outside of the research group that produced it, and consequently it usually embeds assumptions about its operating environment: OS and OS version, installed packages, directory structure, GPU model, cluster machine names, etc.
  • It is not usually explicitly tested, but rather it is tested through use.

The problem is that when there aren’t any obvious errors in the output, we tend to believe that this kind of code is correct. This isn’t good, and it causes many of us to have some legitimate anxiety about publishing incorrect results. In fact, I believe that incorrect results are published frequently (though many of the errors are harmless). So what can we do? Here’s a non-orthogonal list.

Never Ignore Problems

Few things in research are worse than discovering a major error way too late and then finding out that someone else had noticed the problem months earlier but didn’t say anything. For example we’ll be tracking down an issue and will find a comment in the code like this:

  # dude why do I have to mask off the high bits or else this segfaults???

Or, worse, there’s no comment and we have to discover the offending commit the hard way — by understanding it. In any case, at this point we pull out our hair and grind our teeth because if the bug had been tracked down instead of hacked around, there would have been huge savings in terms of time, energy, and maybe face. As a result of this kind of problem, most of us have trained ourselves to be hyper-sensitive to little signs that the code is horked. But this only works if all members of the group are onboard.

Go Out of Your Way to Find Problems

Failing to ignore problems is a very low bar. We also have to actively look for bugs in the code. The problem is that because human beings don’t like being bothered with little details such as code that does not work, our computing environments tend to hide problems by default. It is not uncommon for dynamically and weakly typed programming languages to (effectively) just make up crap when you do something wrong, and of course these languages are the glue that makes everything work. To some extent this can be worked around by turning on flags such as -Wall in gcc and use warningsuse strict; in Perl. Bugs that occur when crossing layers of the system, such as calling into a different language or invoking a subprocess, can be particularly tricky. My bash scripts became a lot less buggy once I discovered the -e option. Many languages have a lint-like tool and C/C++ have Valgrind and UBSan.

One really nice thing about scripty research code is that there’s usually no reason to recover from errors. Rather, all dials can be set to “fail early, fail fast” and then we interactively fix any problems that pop up.

The basic rule is that if your programming environment supports optional warnings and errors, turn them all on (and then maybe turn off the most annoying ones). This tends to have a gigantic payoff in terms of code quality relative to effort. Also, internal sanity checks and assertions are worth their weight in gold.

Fight Confirmation Bias

When doing science, we formulate and test hypotheses. Although we are supposed to be objective, objectivity is difficult, and there’s even a term for this. According to Wikipedia:

Confirmation bias is the tendency of people to favor information that confirms their beliefs or hypotheses.

Why is this such a serious problem? For one thing, academia attracts very smart people who are accustomed to being correct. Academia also attracts people who prefer to work in an environment where bad ideas do not lead to negative economic consequences, if you see what I mean. Also, our careers depend on having good ideas that get good results. So we need our ideas to be good ones — the incentives point to confirmation bias.

How can we fight confirmation bias? Well, most of us who have been working in the field for more than a few years can easily bring to mind a few examples where we felt like fools after making a basic mistake. This is helpful in maintaining a sense of humility and mild research paranoia. Another useful technique is to assume that the people doing previous work were intelligent, reasonable people: if implementing their ideas does not give good results, then maybe we should figure out what we did wrong. In contrast, it is easy to get into the mindset that the previous work wasn’t very good. Evidence of this kind of thinking can be seen in the dismissive related work sections that one often sees.

Write Unit Tests

Modern programming languages come with good unit testing frameworks and I’ve noticed that the better students tend to instinctively write unit tests when they can. In contrast, us old fogies grew up as programmers long before the current testing culture developed and we have a harder time getting ourselves to do this.

But does unit testing even make sense for scripty code? In many cases it clearly doesn’t. On the other hand, Suresh gives the example where they are comparing various versions of an algorithm; in such a situation we might be able to run various data sets through all versions of the algorithm and make sure their results are consistent with each other. In other situations we’re forced to re-implement a statistical test or some other piece of fairly standard code; these can often be unit tested using easy cases. Mathematical functions often have properties that support straightforward smoke tests. For example, a function that computes the mean or median of a list should compute the same value when fed the same list twice.

Write Random Testers

It is often the case that an API that can be unit tested can also be fuzzed. Two things are required: a test-case generator and an oracle. The test-case generator can do something easy like randomly shuffling or subsetting existing data sets or it can make up new data sets from scratch. The oracle decides whether the code being tested is behaving correctly. Oracles can be weak (looking for crashes) or strong (looking for correct behavior). Many modern programming languages have a QuickCheck-like tool which can make it easier to create a fuzzer. This blog post and this one talk about random testing (as do plenty of others, this being one of my favorite subjects).

Clean Up and Document After the Deadline

As the deadline gets closer, the code gets crappier, including the 12 special cases that are necessary to produce those weird graphs that reviewer 2 wants. Cleaning this up and also documenting how the graphs for the paper were produced is surely one of the best investments we could make with our time.

Better Tooling

Let’s take it as a given that we’re doing code reviews, using modern revision control, unit testing frameworks, static and dynamic analysis tools, etc. What other tool improvements do we want to see? Phil Guo’s thesis has several examples showing how research programming could be improved by tools support. There’s a lot of potential for additional good work here.

Summary

There are plenty of easy ways to make scripty research code better. The important thing is that the people who are building the code — usually students — are actually doing this stuff and that they are receiving proper supervision and encouragement from their supervisors.

Hints for Computer System Design

On the last day of my advanced OS course this spring we discussed one of my all-time favorite computer science papers: Butler Lampson’s Hints for Computer System Design. Why is it so good?

  • It’s hard-won advice. Designing systems is not easy — a person can spend a lifetime learning to do it well — and any head start we can get is useful.

  • The advice is broadly applicable and is basically correct, there aren’t really any places where I need to tell students “Yeah… ignore Section 3.5.”

  • There are many, many examples illustrating the hints. Some of them require a bit of historical knowledge (the paper is 30 years old) but by and large the examples stand the test of time.

  • It seems to me that a lot of Lampson’s hints were routinely ignored by the developers of large codes that we use today. I think the reason is pretty obvious: the massive increases in throughput and storage capacity over the last 30 years have permitted a great deal of sloppy code to be created. It’s nice to read a collection of clear thinking about how things could be done better.

Something that bums me out is that it’s now impossible to publish a paper like this at a top conference such as SOSP.

Research Advice from Alan Adler

Although I am a happy French press user, I enjoyed reading an article about Alan Adler and the AeroPress that showed up recently on Hacker News. In particular, I love Adler’s advice to inventors:

  1. Learn all you can about the science behind your invention.
  2. Scrupulously study the existing state of your idea by looking at current products and patents.
  3. Be willing to try things even if you aren’t too confident they’ll work. Sometimes you’ll get lucky.
  4. Try to be objective about the value of your invention. People get carried away with the thrill of inventing and waste good money pursuing something that doesn’t work any better than what’s already out there.
  5. You don’t need a patent in order to sell an invention. A patent is not a business license; it’s a permission to be the sole maker of product (even this is limited to 20 years).

Now notice that (disregarding the last suggestion) we can simply replace “invention” with “research project” and Adler’s suggestions become a great set of principles for doing research. I think #4 is particularly important: lacking the feedback that people in the private sector get from product sales (or not), us academics are particularly susceptible to falling in love with pretty ideas that don’t improve anything.