Externally Relevant Open Problems in Computer Science


Most academic fields have some externally relevant problems: problems whose solutions are interesting or useful to people who are totally ignorant of, and uninterested in, the field itself. For example, even if I don’t want to know anything about virology, I would still find a cure for the common cold to be an excellent thing. Even if I failed calculus I will find it interesting when physicists invent a room-temperature superconductor or figure out why the universe exists.

This piece is about computer science’s externally relevant open problems. I have several reasons for exploring this topic. First, it’s part of my ongoing effort to figure out which problems matter most, so I can work on them. Second, I review a lot of proposals and papers and often have a strong gut reaction that a particular piece of work is either useful or useless. An idea I wanted to explore is that a piece of research is useless precisely when it has no transitive bearing on any externally relevant open problem.

A piece of work has “transitive bearing” on a problem if it may plausibly make the problem easier to solve. Thus, an esoteric bit of theoretical physics may be totally irrelevant or it may be the key to room temperature superconductivity. Of course, nobody thinks they’re doing irrelevant work. Nevertheless, it’s hard to escape the conclusion that a lot of CS (and other) research is in fact irrelevant. My third motivation for writing this piece is that I think everyone should do this sort of analysis on their own, rather than just believing the accepted wisdom about which research topics are worth working on. The analysis is important because the accepted wisdom is — at best — a lagging indicator.

Below is my list. I’ll be interested to see what people think I’ve missed (but note that I’m deliberately leaving out problems like P vs. NP which I don’t think are externally relevant).

Giving More People Meaningful Control Over Computation

There are plenty of people — scientists, analysts, managers, etc. — who are not programmers but who would benefit from gaining better control over computers. I’m using the phrase “meaningful control over computation” as a bit of a hedge because it’s clear that most of these people don’t have 2-5 years to spare in which to become competent programmers. The goal is to give people the power that they need to solve their problems without miring them in the Turing tarpit. A good example is the class of spreadsheet programming languages which expose a useful kind of computation without most of the problems traditionally associated with programming. Overall, this problem is maybe 1/3 programming language design, 1/3 user interface design, and 1/3 domain specific.

Trustworthy Automation

Can we create computer systems that do what we want them to do? This problem encompasses both security and reliability. It’s a good problem to work on because solutions have not only have short-term economic benefit but in the long run they directly support getting humans off the rock, which as I’ve said before is something we need to be working very hard on.

The whole problem of specifying what we want systems to do is enormously hard, and in fact we generally have no precise ideas about this. Even highly mathematical objects like compilers are most commonly specified in human language, if at all. Moreover, the programming languages that we use to specify computations contain elements such as floating point computations, fixed-width integers, and excessive use of side effects — all of which seem designed to impede reasoning.

Intelligent Systems

Can computers be created that interface gracefully with humans? How can augmented cognition be used to sidestep limitations of humans and computers? Which sub-problems of AI are achievable and useful? Here I mean “intelligent” literally, not in the sense it is usually used, where it means “not as obviously bad as the previous thing.” Of course, the goal of AI may turn out to conflict with “trustworthy automation” but we’ll have to cross that bridge when we come to it.

Observing, Simulating, Modeling, and Predicting Everything

The universe produces a gigantic amount of data at scales from atoms to galaxies. Luckily, the theoretical bounds on the amount of computation that can be done using the available energy are very generous. The ugly little space heaters with which we currently compute have not even started to scratch the surface of what’s possible.

This one is pretty vague, but I’m not sure right now how to improve it.

Further Reading

Vernor Vinge and Hans Moravec have created some of the best unconstrained thinking about computers. Asimov was pretty good in his day, but Sladek nailed it. The transhumanist material on the web seems pretty bad, as is Kurzweil’s stuff. Dertouzos’s Unfinished Revolution was not inspiring. There must be more (good and bad) writing on this subject, please make suggestions.

Update from evening of 5/10: This list of external problems is of course subjective. I’d be very interested to see other people’s blog posts describing what they think CS has to offer the non-CS world.

, ,

15 responses to “Externally Relevant Open Problems in Computer Science”

  1. “I might not care about P vs NP, but I definitely care about secure crypto for my transactions.”

    sounds pretty externally relevant to me.

  2. The tricky bit, I suppose, is that usually you can construct an arbitrarily long transitivity chain that gets you to an externally visible result. Assessing this seems difficult.

  3. Excellent goal. As a corollary, if a research program has highly unlikely assumptions at its foundation, then the odds of it ever being transitively useful are quite low.

    I would add to your list improvements in programmer productivity. Much like improvements in materials quality are for civil construction, improvements on programmer productivity mean that professional programmers can do more for less.

  4. Good point Edward. I guess each link in the chain needs to be assessed for plausibility. Once the product of probabilities reaches some cutoff value, it’s OK to stop following the chain. Realistically this is based a huge amount of guesswork…

  5. There are an awful lot of problems that are in NP. If we could solve them in polynomial time (and a polynomial of low degree what’s more), that’d be pretty awesome theoretically, and would also have big wider world implications.

    I guess the problem is that all the smart money is on P not being equal to NP, so maybe all the P=NP research really is externally irrelevant.

  6. Notwithstanding, I think Michael’s point is still valid. You can’t touch any field without bumping up against intractable optimization problems, and any resolution of P vs NP will have impact, if not a direct “here’s an algorithm for your hard problem” kind of impact.

  7. Yes, Suresh, P =? NP is a relevant research problem. But it is not — using the definition from today’s post — an externally relevant problem.

  8. Just to be clear, I’m not trying to pick on P vs. NP. Nothing I work on is externally relevant either. It is sufficient to work on problems that are transitively externally relevant, as long as the chain is relatively direct.

    Also, of course this post is predicated on a purely applied model of CS that may not agree with everyone. But I think it works OK for systemsy areas like the ones I work in.

  9. Once i collected some answers from (more or less) famous computer scientists to the question: What is the most important problem in CS?

    http://focs.wordpress.com/category/replies/

    Not exactly the same question as yours, but the answers overlap. Among them for example “storing and finding things”. This includes: reliable easy backups, solving the e-mail/information overload problem, and personalization.

  10. If your are familiar with safety of D0178b program, you knows that the problem is to to found most of the bug but to demonstrate that their is no bug.

    On research should be : what is needed for a computer langage where a tool could say : this program have no bugs. I know that the general case is not possible (cf the halting problem and NP), but nobody cares the general problem.

    memory management have been a problem : make it automatic or static. For loops introduces “out of bound” errors : use only map and fold. I think that many common problem could be resolve that way, by “design”.

    Then you have to find a way to describe what the coder realy want. Types system are use to ask 2 times the coder what he really want. It’s the same for assertion (assertion that could be tested with a combinaison of an input fuzer that could be generated from the input of source code and a tool coverage).

    But you will always have the same problem : if the specification is hand written, it’s not precise enough. If the specification is written in formal langage, why not compile it directly ?

  11. > Giving More People Meaningful Control Over Computation

    This is a good expression of this whole issue. I’ve often been asked whether biologists should “learn to program” and fumbled with trying to explain that while it would be useful, it’s just a means to an end. Even the simplest non-trivial programming languages (I’d nominate Ruby & Python) require a great investment of time and mastery of a many small but important details: what’s a pure text file, different line-ending conventions, the current working directory, the scope of variables, interpretation and compilation …