Size Matters

Not long ago I served on an NSF panel: a collection of researchers who look at some grant proposals and provide recommendations to NSF program officers who then make decisions about which, if any, of the proposals to fund. After the panel finished, the program manager asked us for feedback about various issues including the question: How large should NSF awards be? To put the question in context, in the programs that fund my kind of research, NSF offers “small” awards of up to $500,000 over up to three years, “medium” awards of up to $1.2M over up to four years, and “large” awards of up to $3M over up to five years. Few large awards are given and the bar for getting these is very high. Most grant proposals in a given size category ask for close to the maximum amount of money. For a given amount of grant money, should NSF (and other agencies) give more small awards or few large ones? I’ll present various arguments and give my opinion at the end.

  • Most grant money in computer science in the US pays PhD students around $20k per year to be “research assistants” working not only towards their degrees, but also towards larger research goals (however, in terms of grant money, the cost can be upwards of $50k per year due to university overhead and other factors). A problem with short-duration grants is that they cannot support PhD students for their entire lifetime in the system, requiring multiple grants to be pieced together.
  • Small grants cause significant administrative overhead at the NSF (running panels, negotiating budgets, etc.). Large grants reduce NSF overheads but push this burden out into the universities who now have to deal with more complicated accounting and reporting.
  • In Europe a lot of the money has moved towards very large grants. From what I hear, this necessitates a lot of jockeying for position, forming coalitions with the correct geographical distribution, etc. Nobody from Europe who I’ve talked to loves this or thinks it is very effective.
  • It can be difficult for small institutions to compete for large grants.
  • Small grants (unless accompanied by higher hit rates) force researchers to spend more time writing proposals, reducing the amount of real work they get done.
  • Large grants favor the status quo: they most often go to groups who have previously received large grants. Often, these groups belong to research centers and institutes that have evolved very sophisticated machinery for acquiring large grants and could not survive without them.
  • Some work is almost impossible to do using small grants. For example, expensive equipment or large software implementation efforts may be required to make a project succeed.
  • Large grants may engender collaboration, which can be beneficial. On the other hand, collaboration has overhead. Furthermore, some fraction of large grants support what I call “false collaboration” which happens when researchers get together and acquire a large grant, but then the individuals just go off and work on whatever they want, separately.

Clearly some sort of balance is needed. My guess is that most funding, maybe 80%, should go towards a large number of small awards, with the rest supporting carefully-chosen larger efforts.

Externally Relevant Open Problems in Computer Science

Most academic fields have some externally relevant problems: problems whose solutions are interesting or useful to people who are totally ignorant of, and uninterested in, the field itself. For example, even if I don’t want to know anything about virology, I would still find a cure for the common cold to be an excellent thing. Even if I failed calculus I will find it interesting when physicists invent a room-temperature superconductor or figure out why the universe exists.

This piece is about computer science’s externally relevant open problems. I have several reasons for exploring this topic. First, it’s part of my ongoing effort to figure out which problems matter most, so I can work on them. Second, I review a lot of proposals and papers and often have a strong gut reaction that a particular piece of work is either useful or useless. An idea I wanted to explore is that a piece of research is useless precisely when it has no transitive bearing on any externally relevant open problem.

A piece of work has “transitive bearing” on a problem if it may plausibly make the problem easier to solve. Thus, an esoteric bit of theoretical physics may be totally irrelevant or it may be the key to room temperature superconductivity. Of course, nobody thinks they’re doing irrelevant work. Nevertheless, it’s hard to escape the conclusion that a lot of CS (and other) research is in fact irrelevant. My third motivation for writing this piece is that I think everyone should do this sort of analysis on their own, rather than just believing the accepted wisdom about which research topics are worth working on. The analysis is important because the accepted wisdom is — at best — a lagging indicator.

Below is my list. I’ll be interested to see what people think I’ve missed (but note that I’m deliberately leaving out problems like P vs. NP which I don’t think are externally relevant).

Giving More People Meaningful Control Over Computation

There are plenty of people — scientists, analysts, managers, etc. — who are not programmers but who would benefit from gaining better control over computers. I’m using the phrase “meaningful control over computation” as a bit of a hedge because it’s clear that most of these people don’t have 2-5 years to spare in which to become competent programmers. The goal is to give people the power that they need to solve their problems without miring them in the Turing tarpit. A good example is the class of spreadsheet programming languages which expose a useful kind of computation without most of the problems traditionally associated with programming. Overall, this problem is maybe 1/3 programming language design, 1/3 user interface design, and 1/3 domain specific.

Trustworthy Automation

Can we create computer systems that do what we want them to do? This problem encompasses both security and reliability. It’s a good problem to work on because solutions have not only have short-term economic benefit but in the long run they directly support getting humans off the rock, which as I’ve said before is something we need to be working very hard on.

The whole problem of specifying what we want systems to do is enormously hard, and in fact we generally have no precise ideas about this. Even highly mathematical objects like compilers are most commonly specified in human language, if at all. Moreover, the programming languages that we use to specify computations contain elements such as floating point computations, fixed-width integers, and excessive use of side effects — all of which seem designed to impede reasoning.

Intelligent Systems

Can computers be created that interface gracefully with humans? How can augmented cognition be used to sidestep limitations of humans and computers? Which sub-problems of AI are achievable and useful? Here I mean “intelligent” literally, not in the sense it is usually used, where it means “not as obviously bad as the previous thing.” Of course, the goal of AI may turn out to conflict with “trustworthy automation” but we’ll have to cross that bridge when we come to it.

Observing, Simulating, Modeling, and Predicting Everything

The universe produces a gigantic amount of data at scales from atoms to galaxies. Luckily, the theoretical bounds on the amount of computation that can be done using the available energy are very generous. The ugly little space heaters with which we currently compute have not even started to scratch the surface of what’s possible.

This one is pretty vague, but I’m not sure right now how to improve it.

Further Reading

Vernor Vinge and Hans Moravec have created some of the best unconstrained thinking about computers. Asimov was pretty good in his day, but Sladek nailed it. The transhumanist material on the web seems pretty bad, as is Kurzweil’s stuff. Dertouzos’s Unfinished Revolution was not inspiring. There must be more (good and bad) writing on this subject, please make suggestions.

Update from evening of 5/10: This list of external problems is of course subjective. I’d be very interested to see other people’s blog posts describing what they think CS has to offer the non-CS world.

What I Want From a Bibliography System

As a professor I spend a fair amount of time wrangling with references. Because it’s free and reasonably simple, I use BibTeX: an add-on tool for LaTeX that automates the construction of a bibliography by pulling references out of a separate text file, assigning them numbers (or other identifiers), and formatting the entries appropriately.

BibTeX entries are widely available on the web. In principle this is great, but in practice the quality of downloadable BibTeX entries is poor: they contain so many typos, omissions, misclassifications, and other problems that I no longer download them, but rather create a new entry from scratch, perhaps using the existing entry as a rough specification. Even BibTeX entries from the ACM’s Digital Library — a for-pay service provided by the premier professional society for computer science — are not directly usable, basically every one of them requires editing. Over time I’ve come to more and more blame BibTeX for these problems, rather than blaming the people creating the entries. Here are the design points I’d like to see in a bibliography system.

Style neutrality: A bibliography entry should be a collection of factual information about a document. The “bibliography style” — a collection of formatting guidelines specific to a particular journal or similar — should be a separate concern. BibTeX does support bibliography styles but the implementation is incomplete, forcing entries to be tweaked when the style is changed.

Minimal redundancy: Any large BibTeX file contains a substantial amount of redundancy, because BibTeX poorly supports the kind of abstraction that would permit, for example, the common parts of two papers appearing at the same conference to be factored out. Duplication is bad because it consumes time and also makes it basically impossible to avoid inconsistencies.

Standalone entries: I should be able to download a bibliography entry from the net and use it right away. This goal is met by BibTeX but only at the expense of massive redundancy. Meeting both goals at the same time may be tricky. One solution would be to follow an entry’s dependency chain when exporting it, in order to rip a minimal, self-sufficient collection of information out of the database. This seems awkward. A better answer is probably to rely on conventions: if there is a globally recognized name for each conference and journal, a self-sufficient entry can simply refer to it.

Plain-text entries: Putting the bibliography in XML or some random binary format isn’t acceptable; the text-only database format is one of the good things about BibTeX.

Network friendliness: It should be easy to grab bibliography entries from well-known sources on the network, such as arXiv. Assuming that the problem of these entries sucking can be solved, I should not even need a local bibliography file at all.

User friendliness: BibTeX suffers from weak error checking (solved somewhat by add-on tools like bibclean) and formatting difficulties. For example, BibTeX’s propensity for taking capitalized letters from the bib entry and making them lowercase in the output causes numerous bibliographies I read to contain proper nouns that are not capitalized. Messing with BibTeX styles is quite painful.

BibTeX is a decent tool, it just hasn’t aged or scaled well. CrossTeX and BibLaTeX sound nice and solve some of the problems I’ve mentioned here. However, since I have about 37,000 lines of BibTeX sitting around, I’m not interested in migrating to a new system until a clear winner emerges.

Publishing for Impact

Choices made in the process of doing research can lead to a 5-10x difference in number of publications for exactly the same underlying work. For example, do I publish my idea in a workshop, then a conference, and then a journal? On the other hand I can just do the conference version. Do I take the trouble to write up a paper about an MS thesis where the student has somehow managed to escape without giving me a paper? (The common tactic for accomplishing this is to get a great job offer with a short fuse, and then utterly disappear.) Do I incrementalize my work into a collection of neat little papers each based on a single idea, or do I glom it all together into monster papers that contain a half-dozen or more research contributions?

Today’s question is:

If my goal is simply to maximize my own research impact over my career, should I aim for the low end (1-3 papers per year) or the high end (ten times more) of this spectrum?

I’ve puzzled through this and have concluded that as long as we keep the goal simple — maximize impact — there are few arguments for publishing more.

The first problem with publishing more is that writing a paper requires significant effort over and above that of doing the research in the first place. I usually estimate the cost of writing a paper at one month, but it could be considerably more or less depending on a number of factors. Since teaching uses up about three months of my effort each year, grant proposals cost 1-3 months, and service and administration another three months, the loss of a month of research time is a serious matter. In a nutshell: since my time is finite, writing more papers means I get less research done. Over a 40-year career, the loss of productivity due to writing 5-10 extra papers per year would be huge.

The other big problem with publishing more is that my target audience consists of extremely busy people. Having impact means that people who matter need to read and understand the material that I produce. If I produce 20 papers per year, how many will be read carefully? Probably not that many. My own reading time is highly constrained, and therefore I have become strongly prejudiced against reading papers by authors who seem like paper machines, because all too often any individual paper is weak and/or it contains a lot of similar thinking to something I’ve already read by that author. On the other hand, there are other researchers who produce less material where I’ll drop everything to read their latest piece of work because it reliably teaches me something new. In a nutshell: producing watered-down content creates a disincentive for people to read my work.

The thing that surprised me, when I sat down and thought through these issues, was how one-sided the arguments against prolific publication seem to be. Of course I’m not unaware of the arguments in the other direction. First, a larger number of papers impresses people who primarily count. Second, publishing more benefits students by giving them practice writing and by building up their vitae. Third, prolific publishing in a wide variety of venues probably exposes a larger number of people to one’s work — though I’m not sure this factor is very important given the social, searchable web as it exists today.

Generalizing and Criticizing Delta Debugging

Delta debugging is a search-based technique for taking an input to a program that triggers a bug and making that input smaller. For example, you might have a sequence of GUI operations that causes Thunderbird to crash. Assuming the crash is deterministic and the input can be replayed automatically, you can iteratively remove UI actions until you end up at a local minimum: a collection of events where removing even one of them makes the crash go away. If Delta is implemented well and properly takes advantage of the structure of the input, the resulting failure-inducing input is not only small, but also most of it is relevant to the failure. Debugging a failure triggered by a small, highly-relevant input is generally pretty easy. The definitive Delta debugging reference is Simplifying and Isolating Failure-Inducing Input by Andreas Zeller and Ralf Hildebrandt (ZH02 from now on).

Delta is fantastically useful, particularly in the context of random testing. As part of my group’s compiler bug-finding project we implemented three new test case minimizers using Delta-like ideas, and we also use an existing line-based Delta implementation. Each of the four reducers has different strengths and weaknesses and in fact local minima can often by escaped by running the implementations one after the other.

Significantly, none of the four test case reducers is a straightforward implementation of an algorithm from the original Delta paper — each of them generalizes it in one or more ways. After spending a lot of time working on test case reduction and thinking about this, I got the idea of writing a paper perhaps called “Generalized Delta Debugging” which weakens many of the assumptions found in the original work. The problem was that the more parts of Delta debugging I generalized, the more the result looked just like a generic greedy search. Thus, it started to look extremely doubtful whether there was any research component to generalizing Delta debugging. This piece explores the consequences of that observation.

Delta == Greedy Search

Just to be clear, by “greedy search” I mean the class of optimization algorithms that are based on a transformation operator and a fitness function. They work by repeatedly transforming the current solution to arrive at a new solution, and replacing the current with the new if the fitness level has increased. No doubt I’m butchering the accepted terminology, but the ideas here are really simple.

The “minimizing delta debugging algorithm” from ZH02 is an instance of greedy search where the fitness of a failure-inducing input is the inverse of its size and the fitness of all non-failure-inducing inputs is zero. The transformation operator removes a contiguous chunk of the input. When the transformation gets stuck — by running out of choices of chunks to remove — it reduces the chunk size. When it gets stuck at the minimum chunk size (a single line, character, or other atom of input) the search is finished.

The “general delta debugging algorithm” from ZH02 is very similar but its goal is to minimize the difference between the current solution and a given input, instead of simply minimizing size. Since I haven’t found many uses for this algorithm in practice, and since it’s not all that different from the minimizing Delta, I won’t discuss it further. Whenever I mention the “Delta algorithm” or similar, it is the minimizing Delta to which I refer.

Which parts of the Delta algorithms from ZH02 can be usefully generalized? As it turns out, pretty much all of them. Let’s look at the different elements in turn.

Generalizing the Transformation Operator

The Delta transformer that deletes contiguous chunks of input at ever-finer levels of granularity is reasonably generic and efficient. However, when attacking highly-structured test cases it often gets stuck at a local maximum long before the test case is fully reduced. (Sorry if I keep switching between minimum and maximum. When discussing size the goal is minimization, when discussing fitness in a generic context, I’ll stick to the convention that the goal is maximization.) Hierarchical delta debugging is a variant that improves performance by operating on sub-trees of tree-structured inputs.

Another generalization is to use a transformer that replaces a chunk of input with something else, instead of simply deleting it. For example, one of our new reducers for C code tries to replace uses of variables with constants. Another replaces function calls with their results, including side effects. These are very effective in practice.

It is also useful to delete parts of the input in a non-local way. For example, to remove an argument to a function in a C program, we must delete it from the definition, declaration, and all uses. Making this transformation work requires a painful degree of friendliness with the C code, but again it’s very useful in practice.

Finally, we sometimes use transformations that don’t even make the test case smaller. For example it may be desirable to replace a small, complex construct (like a call to a trig function in a C program) with a larger but simpler construct (a math expression approximating the trig function’s behavior, perhaps). Similarly, it may be desirable to replace an array with a collection of scalars or a struct assignment with a collection of assignments to members. The scalars or the assignments are then vulnerable to subsequent reduction.

All of these examples point towards a more general idea which is that there is a strong synergy between test case reduction and compiler optimization (which I wrote about earlier).

Generalizing the Fitness Function

ZH02’s minimizing Delta uses 1/size as its fitness function and its general Delta uses the inverse of the string distance between current solution and goal. There are plenty of other useful fitness functions. As I mentioned in the previous paragraph, considering the complexity of different program constructs is useful. We’ve also experimented with using Delta techniques to minimize the number of instructions executed by the test case. The insight is that the complexity of a failing execution depends not only on syntactic characteristics of the failure-inducing input, but also on the dynamic behavior induced by the test case.

A major gap in the ZH02 paper is that it does not address the validity problem: does the transformed test case satisfy the constraints imposed on test inputs? For some uses of Delta no validity test is required because the system under test can detect invalid inputs. On the other hand, the validity problem for C programs is extremely difficult to deal with (in theory, it’s undecidable; in practice, no fun at all) and this has been a major stumbling block in our C compiler bug-finding work — but now solved (thank goodness not by us). Sometimes it is desirable to test software with invalid inputs, but for the C compiler work we want to say that all invalid test cases have zero fitness.

Generalizing the Search Framework

The third element of the Delta debugging algorithms from ZH02 that can be usefully generalized is the search algorithm itself. Backtracking can be used to avoid getting stuck too early, as can other techniques such as simulated annealing or genetic algorithms. Basically, test case reduction can be based on any search algorithm, greedy or not.

A Verdict on Delta

Now I’d like to ask a couple of specific questions about the ZH02 paper. First, why weren’t the algorithms presented as “plugins” to the greedy search framework? This would have improved the presentation by making it clear which elements of the algorithms are important vs. boilerplate. Also, it would have made it easier for readers to understand how subsequent improvements to delta should be structured.

Second, given that delta debugging is a fairly straightforward application of greedy search, how significant is its research contribution? The answer that I lean towards is that the main contribution of delta is giving a nice name to an excellent idea that was, in 2002, somewhat obvious.

Since it is dangerous to criticize an idea by saying, in retrospect, that it was obvious, I’ll provide a bit of justification. First, two of the previously published papers cited by Zeller and Hildebrand (references 8 and 9 in the paper) apply ideas that are basically the same as Delta debugging, but without calling it that. Additionally, a paper they missed — Differential Testing for Software, published in 1998 — described a search-based, automated test case reducer. So it’s clear that by 2002 the ideas forming the core of Delta debugging were floating around.

My opinion is that the two Delta algorithms from the ZH02 paper have little enduring value because they simply do not work very well without modification. At least, we couldn’t make them work without significant generalization. The enduring value of the paper is to popularize and assign a name to the idea of using a search algorithm to improve the quality of failure-inducing test cases.

Moving Forward

As the complexity of computer systems continues to increase, the advantages derived from deterministic execution and automated test case reduction will also continue to increase. Delta debugging provides a conceptual framework for doing so. Unfortunately, it seems that few useful papers on reducing the size of failure-inducing programs that have appeared since the original Delta work. A notable exception is the hierarchical delta debugging work I mentioned earlier. Iterative Delta Debugging is interesting but solves a slightly different problem. Deriving Input Syntactic Structure is a technique that can make hierarchical delta easier to implement. If anyone knows of more work along these lines, I’d like to hear about it.

An Executable Semantics For C Is Useful

The goal of a C/C++ compiler is to turn every sequence of ASCII characters into executable instructions. OK, not really — though it does seem that way sometimes. The real goal of a C/C++ compiler is to map every conforming input into executable instructions that correspond to a legal interpretation of that input. The qualifiers “conforming” and “legal interpretation” are very important. First, the compiler has extremely weak requirements about what it should do with non-conforming inputs, such as programs that contain undefined behaviors (array bounds violations, etc.). Second, all realistic C/C++ programs have a large number of possible interpretations, for example corresponding to different integer sizes, different orders of evaluation for function arguments, etc. The compiler chooses a convenient or efficient one, and the remaining interpretations are latent. They may emerge later on if the compiler options are changed, if the compiler is upgraded, or if a different compiler is used. The point is that the compiler has no obligation to tell us whether the input is conforming or not, nor how many possible interpretations it has.

Thus, while C/C++ compilers are very good at turning conforming programs into efficient executables, they are just about useless for other answering other kinds of questions:

  • Does the program ever execute undefined behaviors, causing it (in principle) to have no meaning and (in practice) to execute attack code or crash?
  • Does the program rely on unspecified behaviors, making it non-portable across compilers, compiler versions, and changes in compiler options?
  • Does the program rely on implementation-defined behaviors, affecting its portability to other compilers and platforms?
  • Why does the program behave in a certain way? In other words, what part of the standard forced that interpretation?

To answer these questions, a wide variety of static analyzers, model checkers, runtime verifiers, and related tools have been developed. These tools are great. However, even taken all together, they are incomplete: there exist plenty of bad (or interesting) program behaviors that few or none of them can find. For example:

  • Very few tools exist that can reliably detect uses of uninitialized storage.
  • Few, if any, tools can correctly diagnose problems resulting from C/C++’s unspecified order of evaluation of function arguments.
  • An lvalue must not be modified multiple times, or be both read and written, in between sequence points. I’m not aware of many tools that can correctly detect that evaluating this function results in undefined behavior when p1 and p2 are aliases:
int foo (int *p1, int *p2) {
  return (*p1)++ % (*p2)++;

The Missing Tool

The missing tool (or one of them, at any rate) is an executable semantics for C. An executable semantics is an extremely careful kind of interpreter where every action it takes directly corresponds to some part of the language standard. Moreover, an executable semantics can be designed to tell us whether the standard assigns any meaning at all to the program being interpreted. In other words, it can explicitly check for all (or at least most) undefined, unspecified, and implementation-defined behaviors. For example, when an executable semantics evaluates (*p1)++ % (*p2)++, it won’t assign a meaning to the expression until it has checked that:

  • both pointers are valid
  • neither addition overflows (if the promoted types are signed)
  • p1 and p2 are not aliases
  • *p2 is not 0
  • either *p1 is not INT_MIN or *p2 is not -1

Moreover, the tool should make explicit all of the implicit casts that are part of the “usual arithmetic conversions.” And it needs to do about 500 other things that we usually don’t think about when dealing with C code.

Who Needs an Executable Semantics?

Regular programmers won’t need it very often, but they will occasionally find it useful for settling the kinds of annoying arguments that happen when people don’t know how to read the all-too-ambiguous English text in the standard. Of course, the executable semantics can only settle arguments if we agree that it has captured the sense of the standard. Better yet, we would treat the executable semantics as definitive and the document as a derivative work.

Compiler developers need an executable semantics. It would provide a simple, automated filter to apply to programs that purportedly trigger compiler bugs. A web page at Keil states that “Fewer than 1% of the bug reports we receive are actually bugs.” An executable semantics would rapidly find code fragments that contain undefined or unspecified behaviors — these are a common source of bogus bug reports. Currently, compiler developers do this checking by hand. The GCC bug database contains 4966 bug reports that have been marked as INVALID. Not all of these could be automatically detected, but some of them certainly could be.

People developing safety-critical software may get some benefit from an executable semantics. Consider CompCert, a verified compiler that provably preserves the meaning of C code when translating it into assembly. CompCert’s guarantee, however, is conditional on the C code containing no undefined behaviors. How are we supposed to verify the absence of undefined behaviors when existing tools don’t reliably check for initialization and multiple updates to lvalues? Moreover, CompCert is free to choose any legitimate interpretation of a C program that relies on unspecified behaviors, and it does not need to tell us which one it has chosen. We need to verify up-front that (under some set of implementation-defined behaviors) our safety-critical C program has a single interpretation.

My students and I need an executable semantics, because we are constantly trying to figure out whether random C functions are well-defined or not. This is surprisingly hard. We also need a reliable, automated way to detect undefined behavior because this enables automated test case reduction.

An Executable Semantics for C Exists

I spent a few years lamenting the non-existence of an executable C semantics, but no longer: as of recently, the tool exists. It was created by Chucky Ellison, a PhD student at the University of Illinois working under the supervision of Grigore Rosu. They have written a TR about it and also the tool can be downloaded. Hopefully Chucky does not mind if I provide this link — the tool is very much a research prototype (mainly, it is not very fast). But it works:

regehr@home:~/svn/code$ cat lval.c
int foo (int *p1, int *p2) {
  return (*p1)++ % (*p2)++;

int main (void) {
  int a = 1;
  return foo (&a, &a);
regehr@home:~/svn/code$ kcc lval.c
regehr@home:~/svn/code$ ./a.out
ERROR! KCC encountered an error while executing this program.
Error: 00003
Description: Unsequenced side effect on scalar object with value computation of same object.
File: /mnt/z/svn/code/lval.c
Function: foo
Line: 2

As I mentioned earlier, very few tools for analyzing C code find this error. Chucky’s tool can also perform a state space exploration to find order of evaluation problems and problems in concurrent C codes. Finally, it can run in profile mode. Unlike a regular profiler, this one profiles the rules from the C semantics that fire when the program is interpreted. This is really cool and we plan to use it to figure out what parts of the C standard are not exercised by Csmith.

Chucky’s tool is already an integral part of one of our test case reducers. This reducer takes as input a huge, ugly, bug-triggering C program generated by Csmith. It then uses Delta debugging to output a much smaller bug-triggering program that (ideally) can be included in a compiler bug report without further manual simplification. Before Chucky’s tool arrived, we had spent several years trying to deal with the fact that Delta always introduces undefined behavior. We now seem to have a bulletproof solution to that problem.

The benefits of executable semantics have long been recognized in the PL community. The new thing here is a tool that actually works, for the C language. Hopefully, as Chucky’s tool matures people will find more uses for it, and perhaps it can even evolve into a sort of de facto litmus test for ascertaining the meaning — or lack thereof — of difficult C programs.

How Can Computer Science Help Us Get To Mars?

SpaceX thinks it can get a person to Mars within 20 years. This seems optimistic, given that SpaceX does not enjoy the significant chunk of the USA’s federal budget that permitted NASA to get to the Moon on a relatively short time scale. Nevertheless, it’s a good goal, and presumably 50 years of improvements in space systems engineering will make up for some of the shortfall.

Since I strongly believe that getting humans off the rock needs to be a priority, I want to help. What are the CS problems that (1) can be addressed in a University and (2) will directly support getting off the rock? Since much of my research is aimed towards increasing embedded software reliability, I suspect that I’m already in the right ballpark, but once we get into the specifics there are a lot of ways to go astray. Perhaps I should see if I can do my next sabbatical at SpaceX. I recently read a book on how space systems can go wrong and (no surprise) there are very many ways, some of them software-related.