What’s Operating Systems Research About?


The other day at lunch I tried to explain to Suresh what operating systems research is all about, which got me thinking about this subject. As a quick glace at the OSDI 2012 program will confirm, the obvious answer “it’s about building operating systems” no longer applies, if it ever did. In fact, the trend away from working mainly on ring 0 code was noted more than 12 years ago in Rob Pike’s entertaining screed (which I regrettably missed—he visited Utah just a few months before I got here). Pike said “the situation is genuinely bad and requires action” but it’s clear that most of his observations are simply symptoms of a maturing field. For example, iOS and Android are systems that I would consider innovative, but they are respectively based on BSD and Linux kernels that Pike would consider completely boring. The fact is: these kernels work. Significant kernel innovation was not required to make modern tablets and smart phones viable. If we take a strict definition of operating systems research, a lot of the interesting work since 2000 has been in virtualization. In fact, it is curious that Pike’s slide deck does not mention virtualization since the modern wave of hypervisors (which originated in academia) was well underway by the time he gave his talk.

So there exists this moderately large community (the last SOSP I attended, in 2009, had 565 attendees) of bright people who understand systems, who aren’t afraid to get their hands dirty, and who want results instead of theorems. But also, the bar for creating a new OS has gotten higher and higher for reasons that Pike describes (there’s a lot of hardware to support and a lot of standards to implement) and also for an important reason that he fails to mention (operating systems already work pretty well). Does this community disband? No, they stick together but the kinds of problems being addressed become more diverse, as the OSDI program illustrates nicely.

It would be sad if a community existed only due to inertia, but that is not the case here. I would claim that the thing holding the OS community (as it exists today) together is a common approach to doing research. I’ll try to characterize it:

  1. The best argument is a working system. The more code, and the more results, the better. Something that is clearly a toy isn’t convincing. It is not necessary to build an abstract model, conduct a user study, prove soundness, prove correctness, or show any kind of asymptotic bound. In fact, if you want to do these things it may be better to do them elsewhere.
  2. The style of exposition is simple and direct; this follows from the previous point. I have seen cases where a paper from the OS community and a paper from the programming languages community are describing almost exactly the same thing (probably a static analyzer) but the former paper is super clear whereas the latter is incredibly difficult to figure out. To some extent I’m just revealing my own biases, but I also believe the direct approach to exposition is objectively better; I’ll try to return to this subject in a later post.
  3. The key to a strong research result is finding the right abstraction. A good abstraction is beautiful; it imposes little performance penalty; it leads to reliable systems; it leaks the right information and blocks things you didn’t want to know. It just feels right. The abstraction is probably for something low-level, but this doesn’t need to be the case. Finding good abstractions may sound easy but it’s super hard, often requiring lots of code to be thrown away multiple times.

And that, friends, is what OS research is about.

UPDATE from 9/15:

In a comment, Suresh says:

It seems to me that you need to be able to list core problems that you want to solve, or things you want to understand. OS as “the study of interfaces” seems overly broad, and characterizes really any system building effort, even if it’s in databases or in a public-key infrastructure.

At the level of an entire subfield I’m not sure you can construct a satisfying list of core problems to be solved. What would this be for software engineering? It would be something extremely vague like “enable predictable, low-cost creation of acceptable software.” Yuck. How about for programming languages? For scientific computing? For theoretical computer science? Obviously we can come up with something, but I think that at this level the approach matters more than the specific problems. The problems tend to come and go over a period of a few years. Some of them (e.g. efficient virtual memory, efficient hypervisors) get solved while others (concurrent programming, secure operating systems) end up being harder than we thought and slowly morph into more tractable versions.

Anyway, I’m bummed that you find this unsatisfying but it’s the best I can do right now. Maybe someone else can do better.

Bhaskar states that “theorems are a kind of result” and of course I agree. However, they are not a kind of result often seen in OS research, which is the only thing I was trying to talk about. He also says:

You first note that the simple and direct style of exposition for systems papers follows from the absence of a need to prove anything rigorous, and then indicate that this style is objectively better. This sounds contradictory to me. The more descriptive style preferred by systems researchers indeed follows from the fact that their core “rigorous/formal” component is code, which is not part of the paper itself; for communities where that component is, say, a proof, it must be presented within the paper itself. The paper consequently needs to be written with more precision, and may consequently be harder to read, particularly to those outside the field. As with any form of literature, the rhetorical style must match the purpose.

First, I like the bit about “their core rigorous/formal component is code” — that’s a great way of putting it.

Second, I shouldn’t have said that the writing style follows from the lack of proofs. Of course there exists wonderfully clear mathematical writing. However, the “OS writing style” does benefit from a relatively baggage-free research framework in which the world is its own best model.

Finally we come to the fun part: “as with any form of literature, the rhetorical style must match the purpose.” Of course this is true, but here we have been given this great gift where through some process of convergent evolution, researchers from different communities have ended up not only attacking the same problem, but also coming up with very similar solutions. If we look at some specific kinds of static analysis and bug finding, we can find papers from software engineering, from formal methods, from programming languages, and from operating systems that are doing essentially the same thing. Thus the purposes are the same. Even so, the rhetorical styles are very different. So we have form not following function, but rather following tradition. I’ve seen this happen a number of times. Reading these papers back to back is kind of like watching Rashomon.

,

8 responses to “What’s Operating Systems Research About?”

  1. First, Pike’s graph on # of new operating systems would be interesting to 1) carry to the present and 2) produce for PLDI (with the appropriate substitution). Second, I agree with #2, though at least once in a while that is for reasons that aren’t completely bad (fitting in to an existing framework comes to mind).

    Also, the first point seems to be spreading — “we are all (with some major exceptions) systems researchers now,” as Nixon might have put it.

  2. It’s interesting that two of the three bullets are “process” items: how the research is conducted, rather than what it focuses on. It would be like saying that theoretical computer science is the set of people who write papers with theorems in them.

    It seems to me that you need to be able to list core problems that you want to solve, or things you want to understand. OS as “the study of interfaces” seems overly broad, and characterizes really any system building effort, even if it’s in databases or in a public-key infrastructure.

  3. Nice post. It’s always good to think and write about what it is we believe we do in our various academic communities.

    Some thoughts:

    1. When you say systems researchers want “results instead of theorems,” I am puzzled. As I see it, theorems are a kind of result, experimental performance measures or even the demonstration of a certain functionality based on prototype software are other kinds of results. In fact, one could even argue that market share of industrial-strength production software is an entirely different kind of “result”, though of interest to non-academic practitioners. To each their own.

    * You first note that the simple and direct style of exposition for systems papers follows from the absence of a need to prove anything rigorous, and then indicate that this style is objectively better. This sounds contradictory to me. The more descriptive style preferred by systems researchers indeed follows from the fact that their core “rigorous/formal” component is code, which is not part of the paper itself; for communities where that component is, say, a proof, it must be presented within the paper itself. The paper consequently needs to be written with more precision, and may consequently be harder to read, particularly to those outside the field. As with any form of literature, the rhetorical style must match the purpose.

    * An aside: as someone who works on the intersection of communication theory, queuing theory, computer engineering, networked systems, and algorithms for networks, I have always found it amusing that in EE, “systems” generally refers to mathematically rich topics like signal processing, communication, and control theory (as opposed to hardware topics like devices and circuits); whereas in CS it is generally understood to be diametrically opposed to theory.

  4. @Bhaskar K. #3: “The more descriptive style preferred by systems researchers indeed follows from the fact that their core “rigorous/formal” component is code, which is not part of the paper itself; for communities where that component is, say, a proof, it must be presented within the paper itself.”

    I don’t see how that follows. In fact it seems like you’re starting with premise X and concluding ~X. Here’s X: “Operating systems papers are (merely) descriptions of a result; the formal description (code) must reside outside the paper itself.” And here’s ~X: “Programming language papers are descriptions of a result; the formal description (proof) must reside *within* the paper itself.”

    “Literate programming” forces the informal and formal descriptions of code to cohabit. Couldn’t we agitate for more journals to adopt “illiterate mathematics”, in which results are described in plain English and all the formal bits relegated to a separate repository? 🙂

  5. I don’t think OS researchers will be out of a job anytime soon… sure, we might not be building OSes from scratch all that often (but it still happens, see things like Singularity and Barrelfish), but there is infinite work to be done to make the OS kernels we have work better.

    Common problems that I see in my work are:
    * Running efficiently on top of hypervisors
    * Virtualization in general
    * Security, compartmentalization, and similar
    * Combining real-time performance and complex processors
    * Scaling up to 100s of cores
    * Device drivers – how to make them easier to write, more stable, …
    * Integration of hardware and software – how can an OS expose hardware features and how can hardware be tweaked to help the OS? Notice how solaris has influenced the design of the UltraSparc T5.
    * Testing and stress-testing operating systems including races and device driver problems
    * Hardware design that helps write efficient operating systems

    So, OS research ought to be a flourishing field

  6. What is your opinion on how Operating Systems Research was treated in the new ACS Computing Classification System (http://dl.acm.org/ccs.cfm)? Hardware, Computer Systems Organization, and Security seem to cover some relevant areas, but OS Research as I think of it seems to be well and truly fragmented in the proposed scheme. I suppose this supports your thesis that OS Research is now many things to many people.