A New Life Awaits You in the Off-World Colonies

As a recipient of research funding from the US government, I sometimes think about how this resource should be allocated.  Wikipedia claims that for most developed countries, research funding amounts to 1.5% to 3% of GDP, so we are talking about a lot of money here.  Appropriate focusing of this funding can change the world — and it already has, of course, many times.  (Just to be clear: for 9 months out of each year my salary is paid by the state of Utah.  However, my group operates on federal grant money typically exceeding my salary by several times.)

The research funding allocation problem is interesting because it’s not so much a matter of determining the most pressing needs, but of determining which important research areas are not being addressed adequately by the private sector.  Corporations’ responsibility to their shareholders makes them myopic, so there are many such areas.

My premise in this post is that at some point, something is going to get us.  By “us” I mean the human race and by “get” I mean utterly kill, or close enough.  I don’t think this is an unreasonably paranoid view: the solar system has a rich history of delivering very big rocks to us and we seem to be doing a perfectly good job in creating our own threats to continued existence.  Perhaps I should not have watched “Fail Safe” the other night.

What does this have to do with research funding?  It means that some significant fraction of our research money (25%, for example) should be devoted to work on the critical path to creation of sustainable off-Earth colonies.  In the long run, creating these colonies is literally the only thing that matters.  Clearly, we cannot build them right now.  So what are the missing technologies?  Obvious candidates include: surviving or avoiding microgravity, sophisticated and reliable automation, stable closed ecosystem design, and reliable energy sources.  I’m not trying to say anything new about space exploration; the point is just that we’re wasting valuable time.  The current under-focused approach to research funding is not going to give the human race a Plan B anytime soon.

How do we determine the specific topics to work on?  One option is to ask the NAS, NAE, and NRC, who would each spend two or three years forming committees and writing reports.  This is a fine idea, but boring.  We could also run some workshops where various experts from government, industry, and academia get together and talk. Again it’s hard to stifle a yawn; no matter how good the intentions, somehow these kinds of events do not seem conducive to innovative and critical thinking.  A better idea might be for small groups of people to self-organize and start doing the work.  As researchers we get a large amount of autonomy and surely this can be bent towards longer-term, more meaningful goals than we typically aim for.

How would my job change if tomorrow I decided to devote all of my efforts towards minimizing the time remaining until we can build a self-sustaining colony on the Moon or Mars?  At least in broad terms, probably not much!  It doesn’t really matter where these colonies are or what form they take: they’re going to need a huge amount of embedded software to sustain their existence and it had better work really well.  What, did you think I was going to end this post realizing I need to change jobs?

Short Order Dad

Since having kids I’ve stopped treating cooking as entertainment and started treating it more like a job.  I’m short-order Dad and the goal is to get something edible on the table, rapidly and reliably.  A bit of diversity is important too, since people get tired of things quickly.

Most of the “30-minute” cookbooks suck.  First, they lie. Oh how they lie.  Jacques Pepin’s Fast Food My Way and America’s Test Kitchen 30 Minute Suppers are both perfectly good cookbooks containing many tasty recipes. But you cannot, no matter how hard you try, make most of these recipes in half an hour or less.  Clearly, timed testing by actual end-users is not a criterion for putting a recipe into this kind of book.  The second problem with this category of  cookbook is that they advocate unacceptable shortcuts (the two examples above don’t do this, but many do).  Ketchup is not an ingredient, ever — what’s so hard about that?  Other things that are unacceptable include anything frozen after being fried and anything that resembles cheese, but is not (the kids demand boxed mac and cheese sometimes, so we do make exceptions).

A fantastic example of a short order cookbook is Nigel Slater’s Real Fast Food; I’ve read every page probably twice.  The difference is that instead of relying on extensive ingredient lists and gimmicky shortcuts, his recipes are based on a small number of basic ingredients.  The fact is, you can rapidly make a large number of good things out of onions, canned tomatoes, eggs, butter/oil, cheese, pasta, rice, basic spices, potatoes, and at most a dozen other staples.  Sometimes advance prep work is needed — this does not need to be a big deal.  To make fried rice, you need cold rice from that morning or the night before.  Many recipes benefit from home-made meat stock or tomato sauce; these can be time consuming but are a perfect thing to make on a bad-weather weekend afternoon.  Tabouli tastes much better if made in advance.  A big batch of polenta gives three meals: one hot and two out of cold, sliced polenta.  In fact it is often possible to work ahead with minimal effort by just making large batches of food, when the extras can be refrigerated or frozen.  All of these kinds of advance planning are pretty easy and pay off well.  Then, when things get hectic and planning fails, we order out pizzas or get Thai food, no big deal.  But I prefer eating out to be a fun thing, not a last resort.

One way to make it easy to cook at home is to have single-dish meals.  Life is too short for thinking about salads or other side dishes, although we do have these occasionally.  A bowl of cut-up cherry tomatoes, olive oil, pepper and a bit of cheese is perfect in my opinion.  Appetizers are never necessary.  Dessert can be fruit, a chocolate bar, cookies out of the freezer, or whatever.

So what do we actually eat on a day to day basis?  Here are some of our common fast meals:

  • baked pasta, covered with home-made tomato sauce (either Nigel Slater’s quick recipe, or frozen), maybe some pork sausage, then grated mozzarella and parmesan cheese
  • fried rice
  • rice pilaf
  • sweet potato and black bean burgers
  • salad nicoise
  • frittata or omelet
  • fried pastrami, egg, and cheese sandwiches
  • white chicken chili
  • chicken noodle soup
  • tuna salad or tuna melts
  • Greek salad
  • fried vegetables and pasta, maybe with tomato sauce, topped with cheese
  • pasta salad with steamed peas, grilled chicken, browned onions and garlic, and crumbled bacon
  • sautéed scallops
  • risotto (really very easy, but lots of stirring required)
  • pizza bagels
  • crab cakes
  • hummus
  • salmon cakes
  • grilled Alaskan salmon
  • grilled sausage
  • baked sausage and potatoes
  • pancakes or crepes
  • tacos, burritos, quesadillas
  • stir fry
  • potato pancakes or potato kugel
  • lentils (Jamie Oliver has an awesome recipe)
  • polenta
  • hamburgers

Some of these don’t quite make a meal, so they’re combined with leftovers, a can of sardines, or whatever.

I’m making it sound like I’m the only cook in my house; that isn’t the case at all.  However, I do enjoy it more than Sarah does and she’s happy to do the dishes if I make the food, so that’s what often happens.

To maintain at least a veneer of civility, we have a rule that you do not criticize Dad’s cooking.  It is fine to turn green, choke, gag, leave the table, make funny faces behind my back, or whatever.  But we do not criticize.  This is a helpful guide to behavior for children whose first impulse, on hearing that dinner is anything besides baked ziti, is to scream “but I HATE that!”  Another simple thing that makes cooking work is to crack open a beer or pour a glass of wine before even starting to figure out what to make.  This just makes the rest of the process go more smoothly, especially when the little food critics learn what is about to be served.  It also helps to have hungry kids.  When I’m running the show snacking is forbidden under pain of death (or at least serious tickling) starting around 4:30.

Computer Science Fiction

Science fiction explores the effect of technological progress on society.  It is ironic, then, that the majority of SF authors miserably failed to predict the impact of computers and information technology. Why does Google return no meaningful hits for “computer science fiction?”  Is it not obvious that this term needs to exist, if we wish to understand the next 50 years at all?

Looking back, it is clear that most SF authors made the same mistake: they took whatever computers existed at the time and envisioned larger, more powerful, smarter versions of the same kind of machine.  The genre is filled with this kind of obvious — and, in retrospect, boring — extrapolation. In contrast, the interesting thing that has happened over the past few decades is the development and deployment of new kinds of computer-based systems, which are now pervasively embedded in the world.  Something like 10 billion processors are manufactured each year and only a tiny fraction looks anything like a mainframe.

Of course there were other mis-steps, such as overlooking the impact of wireless communication.  One of my favorite mispredictions comes from Neuromancer, where Gibson makes a point of showing us that pay phones still exist in the middle of the 21st century.  People in Neuromancer have their brains wired directly into the network, but they don’t carry cell phones.

As a professional computer scientist, part of my job is a form of applied science fiction.  Instead of trying to predict the impact of science, we try to make the impact happen.  Doing this job well, however, requires real insight into the future.  Therefore, one of the questions I’m interested in is: Who in SF got it right?  That is, who saw beyond the obvious extrapolations about computer technology and really understood what the future might look like?  And how did they do it?

The best single example of a computer science fiction author is Vernor Vinge.  His story True Names preceded the cyberpunk movement and paved the way for subsequent work like Neuromancer and Snow Crash.  Vinge’s books contain a high density of good ideas, on a par with some of the best earlier idea-driven writers like Asimov and Niven.  A Fire Upon the Deep gets solid mileage out of pack animals that function as distributed systems and also provides a reasonable exploration of what really powerful machines might be like (“applied theology” indeed).  Subsequently, A Deepness in the Sky gave us the term “software archeology” — an ugly but highly plausible window into the future of software maintenance — and is the first and perhaps only work in the sub-sub-genre “sensor network science fiction.”  Of course, Vinge depicts pervasive embedded sensing as invariably leading to totalitarian control and then societal collapse.

There are two major CS-related themes running across Vinge’s work.  The first is the singularity, and probably too much has been written about it elsewhere.  It seems uninteresting to speculate about a point in time that is defined as being immune to speculation.  The second centers on the fickle nature of control in computer systems.  Very early, Vinge foresaw the kinds of battles for control in networked resources that are playing out now in botnets and in corporate and government networks.  Concepts like a trusted computing base and subversion via network attack look like they are probably fundamental.  Our understanding of how these ideas will evolve is flawed at best.  In fact, the entire history of computer security has been adversary driven: we put an insecure system out into the world, wait for it to be exploited, and then react.  This happens over and over, and I have seen very few signs that a more proactive approach is emerging.

How did Vinge do such a good job?  This is tough to analyze.  Like all good extrapolators, he separated the fundamental from the incidental, and pushed the fundamentals forward in useful ways.  He knew enough CS to avoid technical gaffes but somehow failed to let this knowledge interfere with his predictions.  It is no coincidence that few of the now-common top-level uses of the Internet were thought of by computer scientists: we’re too busy dealing with other levels of the system.  We know how computers are supposed to be used and this is a huge handicap.

As a group, the cyberpunk authors did a good job in seeing the impact of information technology, but as far as I can tell their contributions to computer science fiction were largely derived from things that Vinge and others had already predicted.  It’s interesting that almost all of these authors insisted on making the network into a physical space accessed using VR.  This was more than a plot device, I think: people really want cyberspace to be somehow analogous to physical space.  The real-world failure of VR as a metaphor for understanding and navigating networked computer systems has been interesting to watch; I’m not sure that I understand what is going on, but as I see it the network simply does not demand to be understood as a physical space.  Of course, spaces are useful for games and other kinds of interaction, but it’s not at all clear that VR will ever be the primary metaphor for interacting with the network.  The poor state of interface technology (Where are those brain plugs? Why are we still using stupid mice?) is obviously a factor as well.

200 Compiler Bugs

This morning I reported the 200th bug found by our compiler testing tool.  It is a new way to crash GCC.  The failure-inducing input is not pretty so I won’t give it here, but it can be found in GCC’s bugzilla.  Although the testing tool is now entirely developed by some excellent PhD students, I have remained the primary bug reporting person.  It is basically grunt work but it keeps me technically involved with the project.

It took us about two years of real time and maybe three person-years of effort to find these 200 bugs.  When will we reach 300?  It’s not clear.  On one hand, our testing tool is becoming extremely powerful and we are branching out into more target architectures.  On the other hand, both GCC and LLVM have developed a significant degree of immunity to our tests.  The co-evolution of the testing tool and the tested systems has been interesting to watch and I hope to write more about this at some point.  In any case, at this point our tester is the bottleneck for reporting bugs in LLVM and GCC (for x86/x64 and for the most common compiler flags, at least).  For most other compilers we have looked at, the bottleneck is either in creating good bug reports — this is time consuming — or in the compiler team’s ability (and sometimes, unfortunately, willingness) to fix the bugs that we report.

Our (generally) uncomplaining and silent partners in this effort are the compiler developers who fix the bugs that we report.  I often watch the progress of each bug report as it is confirmed, discussed, and eventually fixed.  Over the course of this project I’ve become very impressed with the kind of hacking talent the LLVM and GCC projects have attracted.  Hopefully their effort, and ours, is well-spent and in the long run we will end up with several highly reliable open-source compilers.

Finally, I should add that we are exceptionally grateful to DARPA for funding this effort.

How to Evaluate a Computer Systems Research Paper

Some excellent resources exist about how to write a good systems paper. This post is about a slightly different topic.

In a typical recent year I review about 100 papers, mostly conference papers 8-14 pages long in 9 or 10 point font. People in similar positions — mid-career computer systems professors — are generally in the same situation and some have it worse. Since a good review takes three or four hours to write, it’s important to develop some shortcuts. Many of mine take the form of “sniff tests”: ways to rapidly discover that a paper contains bogus or useless results. Perhaps one third of papers that I review fall into this category. If I can save time by writing relatively brief reviews of them, then I can spend more time reviewing the marginal-to-good papers: these are the ones that stand to benefit most from detailed feedback. The best papers, like the worst, require relatively little effort to review.

I almost always skip the abstract.  For one thing, I’ve probably already read it when deciding how to rank the paper in my reviewing preferences.  Additionally, the introduction to the paper almost always contains the same information but with more motivation and background.

The first thing I do when evaluating a paper is to read its conclusion. Authors are almost invariably more truthful about their contributions in the conclusion than in the abstract and introduction. Why? First, people often write their papers front-to-back, and usually the introduction gets written before the final results have arrived. The authors are still optimistic. Second, it is simply psychologically harder to oversell the results of paper in the conclusion, where the authors are aware that the reader has just finished reading a perhaps not totally conclusive evaluation section.

The most important questions to ask when reading a paper are: Does it contain a new idea? A useful idea? Both can be hard to answer.

One complication in evaluating novelty is that the actual contribution of a paper is often different than the one(s) stated in the paper. Most of the time this is not due to any deliberate misrepresentation by the authors. Rather, people usually emphasize what they thought was hard or fun about the work, neglecting the fact that many hard and fun — and often substantially similar — research projects have been done in the past. Also, the authors’ perception of their contributions tend to be heavily colored by their previous work and their backgrounds.  Furthermore, sometimes the actual contributions of a piece of work become clear only years after its initial publication.

Another problem is that evaluating the novelty of an idea requires a huge breadth of knowledge covering many thousands of papers in related, and not-obviously-related, research areas. It is not uncommon for zero or one of the people evaluating a paper submitted to a conference to have a really solid idea about where the paper fits into the literature. Complicating matters, many people submit papers to a venue that they know and like, where it will be evaluated by people they know and like, even if these are not the most appropriate people to be evaluating the paper. Changing research sub-areas seems to be much easier and more appealing than changing communities.

To show that an idea is useful, it is customary to evaluate it analytically and/or experimentally. Experimental results come in many sub-flavors, including those based on simulation and implementation. Regardless of the actual technique, there are many sniff-tests that should be applied to any computer systems paper.

For analytical results: Does the result make sense? Is it grounded in reality? Does it tell us anything new? Are the theorems and lemmas actually formal or are they “pseudo-formal”: written in the formal style but lacking key definitions and steps in reasoning?  Would a theoretician with the appropriate background find the results useful or interesting?

For simulation results: Did the authors have to use simulation or were they simply too lazy to find a better evaluation method? Are the simulation parameters realistic? Does the simulation tell us anything new? I once reviewed a paper that presented a collection of analytical results and then a collection of simulation results that were based on an exact implementation of the analytical model. Of course, the “experimental” results matched the predicted results nearly perfectly (and trivially). Another paper that I once reviewed performed its evaluation on a simulation of a large multiprocessor computer where the parameters and workload were chosen in such a way that the aggregate throughput of the multiprocessor was around one instruction per cycle. Of course any conclusions reached from this kind of simulation are useless.

For experimental results: Did the authors measure the right quantities? Were appropriate tests of statistical significance used? (In computer systems, confidence intervals are quite avant-garde.) Is the measured effect robust? Is the baseline a sensible one? A commonly used trick is to compare the new work against an obsolete or otherwise obviously defective baseline, rather than the state of the art. Another common trick is to report on the degree of improvement offered by a new technique, but to omit the absolute numbers.

One sniff test I like to apply to a piece of research is to ask the following questions. First, what percent of actual systems cannot benefit from the proposed technique no matter what? Second, what percent of systems get the benefit of the proposed technique without any special effort? Third, what percent of systems are left? Perhaps surprisingly, a lot of research fails this trivial test. As an example, let’s suppose I’m proposing a new CPU scheduling technique for desktop Linux/Windows/MacOS boxes (this is not a random example: my PhD thesis was basically about this).  First we ask: what class of systems cannot benefit from the proposed technique?  There are several answers, but probably the most obvious one is overloaded machines: those that cannot finish their workloads (displaying video frames, decoding audio chunks, etc.) on time regardless of scheduling discipline.  Second, we ask: what class of systems gets the benefits of good scheduling without a good scheduler?  The answer is: those with low CPU loads.  If the average length of the run queue is not greater than one, then all work-conserving scheduling algorithms are equivalent.  Finally, we ask: which systems are left?  That is, who actually benefits from the smart new scheduler?  The answer is: systems whose load is in a fairly narrow range between too low and too high.  The narrowness of this band, and the presence of techniques for getting into the underload region (for example manually reducing the degree of multiprogramming), were the ultimate reasons why I stopped working on scheduling problems.

Other common problems include: A solution to a problem that comes with significant, unavoidable costs that are not discussed. Solutions that cannot scale to the situations they are meant to address. Improvements that are far too small to be significant. Techniques that exploit the same source of benefit as an existing technique, but that are more complicated or are otherwise inferior.

Looking back over this post I see that it could be read as being very cynical: just looking for ways to reject papers.  But the fact is, if the research community is unwilling to self-censor, then the pushback has to come from somewhere else.  It’s also worth noting that the interaction between the relentless pressure to publish and program committees full of people like me has resulted in an incredible proliferation of conferences and workshops.  To a limited extent, this kind of community diversification and evolution is good, it helps the field adapt to changes.  On the other hand, most people would agree that things have gone too far.  But this is a subject for another post.

There is plenty of excellent systems research being done. Fantastic new ideas are changing how systems are built and making it possible to build new kinds of computer systems. The problem, on the other hand, is that the research community produces a lot of mediocre and useless work (I have produced some myself).  Lacking a marketplace, we rely on humans to evaluate both the good work and the bad, and those of us who have to evaluate a broad cross-section of research results need to adapt our strategies in order to make effective use of time. In summary, Sturgeon’s Law applies.

50 Vertical Miles

A little over a year ago my family moved to a house near the north edge of Salt Lake City.  Although access to real mountains is not great — it’s about a three-hour walk to the nearest 8000′ peak and a major slog to a 9000′ peak — the foothill access is excellent.  At the same time, after way too much sedentary work, sedentary travel, and time at home with small kids, I found myself with high blood pressure and needing to lose weight, so I started doing a 45-minute hike each day, with a bit over 750′ elevation gain/loss.

After a year of this I ended up in decent shape and around 20 pounds lighter.  The cool part, though, is that 365 days of 750 feet comes out to 50 vertical miles hiked.  I was a little disappointed to compute that I’ll never be able to hike to the equivalent of geosynchronous orbit, but low Earth orbit should be attainable this year.  Of course due to travel and being sick, I missed some days, but also there were plenty of days where I hiked 2000-3000 vertical feet, so probably the average was maintained.  The hardest part is not missing days when weather is crappy or work and kids make life busy.  The solution, however, turned out to be easy: a good facemask and a powerful headlamp.

Although hiking the same set of trails day after day threatens to become boring, there has been a nice unintended benefit.  Since little brain power is required, I get a lot of unstructured time to think.  As far as I can tell, this has improved the quality of my work quite a bit; I usually return from a hike with three or four new ideas for me or my students to try out.  Even if only a few percent of these ideas are useful, the time is still well spent.  Hiking is even better than the shower for generating new ideas — who knew?