Future Directions for Optimizing Compilers

I wanted to write a manifesto-ish sort of piece about what compilers are supposed to look like in the future. Nuno was the obvious coauthor since I’ve talked to him about this topic so much that I’m overall not really sure which parts started out as his ideas and which were mine. The article didn’t end up feeling like a good fit for a series of blog posts, so we’ve placed it on arXiv: Future Directions for Optimizing Compilers. The first paragraph of the paper gives the main idea:

As software becomes larger, programming languages become higher-level, and processors continue to fail to be clocked faster, we’ll increasingly require compilers to reduce code bloat, eliminate abstraction penalties, and exploit interesting instruction sets. At the same time, compiler execution time must not increase too much and also compilers should never produce the wrong output. This paper examines the problem of making optimizing compilers faster, less buggy, and more capable of generating high-quality output.

We plan to keep revising the paper and would be happy to receive feedback on it.

Mapping Mountains on Exoplanets

Snow in the mountains evolves in predictable ways that are strongly influenced by the terrain. For example, warmish snowstorms leave a characteristic snow line that looks very much like a contour on a topographic map:



On the other hand, when snow melts the effect is very different, and is often dominated by aspect. There pictures taken near my house a few days apart show more melting on south-facing slopes that receive more solar energy:

Effects like these are taken into account by a snowpack model: a computer simulation designed to predict snowpack evolution in order to, for example, make predictions about avalanches or about spring runoff. Runoff is important not only because it can cause flooding but also because areas near the mountains depend on the runoff for drinking and agricultural water. A snowpack model would typically take topographic information and snow depth information as inputs, but people have also done research into driving snowpack models using weather information; this is useful for areas where there are not a lot of snow depth measurement stations, such as the Snotel network in the western USA.

This post asks the question: could we use images of snowpacks from above to create topographic maps of exoplanets? Obviously this requires telescopes with vastly improved angular resolution, these will likely be optical interferometers in space, or perhaps on the moon. Resolution can be increased by making the telescope’s baseline longer and light-gathering can be improved using larger mirrors or by integrating over longer time periods (planets don’t evolve rapidly, I don’t see why early exoplanet images wouldn’t be aggregated from weeks or months of observations). Space telescopes involving multiple satellites are going to be expensive and there have already been some canceled missions of this type. I haven’t checked their math but this article describes the collecting area and baseline required to get images of different resolutions; unfortunately it doesn’t mention what assumptions were made about exposure times to get the images.

Assuming we can image a snowy exoplanet in sufficient detail, how can we determine its topography? We do it by making a snowpack model for the planet, probably involving a lot of educated guesses, and then invert it: instead of parameterizing the model with the topography, we instead search for the topography that is most likely to explain the observations. We could easily prototype this mapping software right now; it can be validated in two ways. First, we can take archived satellite photos of Earth, degrade them using reasonable assumptions about telescope performance and exoplanet distance, and then use them to infer the topography. Of course we have good ground truth for Earth so validation is nearly trivial. The second way to validate this mapping method is using simulated mountainous, snowy exoplanets. This would be useful to check the robustness of the methods for less-Earthlike bodies. A prototype of these ideas seems very much in reach for an ambitious PhD student.

I tried to find previous work on this topic and didn’t turn up much, I would appreciate links either in a comment here or in private email if you know of something. Let’s take a quick look at why most of the ways that we get topographic data won’t work for exoplanets:

  • The original way to make maps, getting out there and surveying the land, obviously doesn’t apply.
  • Photogrammetry requires parallax.
  • Active sensing using radar or a laser is infeasible.

On the other hand, inferring mountains’ shape from changing light and shadows over the course of a day and from day to day (if the planet’s axis is tilted) will work, and no doubt this is how we’ll first know that an exoplanet is mountainous. Snow-based techniques could be used to refine these results.

Planning for Disaster

Alan Perlis once said:

I think that it’s extraordinarily important that we in computer science keep fun in computing. When it started out, it was an awful lot of fun. Of course, the paying customers got shafted every now and then, and after a while we began to take their complaints seriously. We began to feel as if we really were responsible for the successful, error-free perfect use of these machines. I don’t think we are.

This is a nice sentiment, perhaps even a defensible one if we interpret it as narrowly talking about academic computer science. On the other hand, probably within 20 years, there’s going to be a major disaster accompanied by the loss of many lives that is caused more or less directly by software defects. I don’t know what exactly will happen, but something related to transportation or SCADA seems likely. At that point we can expect things to hit the fan. I’m not optimistic that, as a group, computer scientists and computing professionals can prevent this disaster from happening: the economic forces driving automation and system integration are too strong. But of course we should try. We also need to think about what we’re going to do if, all of a sudden, a lot of people suddenly expect us to start producing computer systems that actually work, and perhaps hold us accountable when we fail to do so.

Obviously I don’t have the answers but here are a few thoughts.

  • We know that it is possible to create safety-critical software that (largely) works. Generally this happens when the organizations creating the software are motivated not only by market forces, but also by significant regulatory pressure. Markets (and the humans that make them up) are not very good at analyzing low-probability risks. A huge amount of critical software is created by organizations that are subject to very little regulatory pressure.
  • It is difficult to tell people that something is a bad idea when they very much want it to be a good idea. We should get used to doing this, following Parnas’s courageous example.
  • It is difficult to tell people that something is going to be slow and expensive to create, when they very much want it to be quick and cheap. We need to get used to saying that as well.
  • We can and should take responsibility for our work. I was encouraged by the field’s generally very positive response to The Moral Character of Cryptographic Work. Computer scientists and computing professionals almost always think that their particular technology makes the world better or is at worst neutral — but that is clearly not always the case. Some of this could be taught.
  • We need to be educating CS students in methods for creating software that works: testing, specification, code review, debugging, and formal methods. You’d think this is obvious but students routinely get a CS degree without having done any sort of serious software testing.

Finally, let’s keep in mind that causes are tricky. A major human disaster usually has a complex network of causes, perhaps simply because any major disaster with a single cause would indicate that the system had been designed very poorly. Atomic Accidents makes it clear that most nuclear accidents have been the result of an anti-serendipitous collection of poor system design, unhappy evolution and maintenance, and failure-enhancing responses by humans.

Acknowledgments: Pascal Cuoq and Kelly Heller gave some great feedback on drafts of this piece.

Is the Browser the New OS?

Yes, this is an old question. I still think it’s interesting. Disclaimer: I haven’t tried out a Chromebook yet.

First, let’s look at the situation as of late 2012. The applications I use generally fall into three categories:

  1. Web-based.
  2. Native, but easily available on Windows, Mac, and Linux. These include a file browser, a shell, Emacs, LaTeX.
  3. Native and less portable: Photoshop, development tools such as compilers and debuggers, high-performance games.

A quick look at the Chrome store indicates that most or all of the applications in category #2 are already available in browser versions (though I currently use the native versions out of inertia). Category #3 is trickier; it’s not clear that it really makes sense to port, for example, Photoshop into the browser. On the other hand, category #3 is basically power tools and many people (including me, much of the time) can get by using weaker web-based versions. Even embedded software development tools, which might at first seem to be the antithesis of web applications, have web-based versions. In summary, to a first approximation, “the browser is the new OS” has mostly happened already, though we do see an interesting reverse trend in the mobile space (though quite a few of those native apps are thin wrappers around a browser).

Second, how did we get to this point? The first application I ever saw that used a browser as its primary interface was SATAN in 1995. I remember thinking this was an elegant and striking design decision; it permitted the implementors to focus on their application logic instead of wasting time on what would likely have ended up being the kind of crappy X windows GUI that was standard at the time. Not long after 1995 we saw the rise of Java-enabled browsers making everyone (for a short time, at least) get interested in the potential for platform-independent applications delivered over the web. But this potential remained unrealized. Around 2008 it became clear that the following three efforts, taken together, constituted a serious attempt to create a new platform. First, Google Chrome, with a fast JavaScript engine and significantly increased robustness (and perhaps also security) due to the process-per-tab model. Second, Google Gears, which facilitated disconnected operation for web applications. And third, Google Docs, which implements at least 80% of the useful stuff in MS Office. Of course Gears is gone but similar functionality is available elsewhere. More recently, Chrome OS and the Chromebook make it clear what Google wants to see. I’m painting a sort of Google-centric picture here and in fact I use their stuff a lot. However, most of the time I could get by using Firefox, Bing, and other alternative technologies.

Finally, what is the prognosis for browser-is-the-OS, looking forward? What will it take for people to really not care about which OS they’re running? First, we want to minimize the number of applications in category #3, but realistically I think most casual users don’t have that many apps in this category and us power users are willing to compromise. For example, the special-purpose tools I use for research are probably always going to run outside the browser and that is fine—they’ll run on workstation-class machines at the office and in the cloud. Photoshop and high-quality video processing is not likely to be moving into the cloud real soon, but on the other hand again these are special-purpose, workstation-grade applications. Most people already do their photo editing online.

The second thing that I require is that any web application that looks remotely like an editor (whether for programs, documents, presentations, web pages, or whatever) has to support rock-solid disconnected operation and have a sensible conflict resolution mechanism. Google Docs’ disconnected operation seems pretty good since last summer, but I worry that lots of applications are going to need special-purpose logic to handle this nicely, and it’s going to be a long slog since many developers won’t consider disconnected to be a priority.

Third, we need near-native performance in the browser. Plenty of JavaScript work has been done and it’s getting pretty fast. For more compute-intensive workloads Google Native Client seems like a good option.

In summary, we seem to be headed towards a segregated world where most people don’t need a PC, but rather an appliance that runs a browser. On the other hand, people who use computers more seriously will keep using workstation-style machines that run native apps when considerations such as performance, dealing with large amounts of data, security, reliability, and local control are paramount. I’ll go ahead and predict that in 2022, the number of PCs sold will be 25% of what they were in 2012. By PC I mean something like “non-Android, non-iOS machine primarily intended to be used by a single person to run native applications”—a category that includes all laptops except Chromebooks.

Core Question

[This post is about machines used by people. I realize things are different in the server room.]

We had one core per socket for a long time. When multi-cores came along, dual core seemed pretty awkward: real concurrency was possible, but with speedup bounded above by two, there wasn’t much point doing anything trickier than “make -j2”. Except in low-end machines two cores seems to have been a passing phase. Now, several years later, it is possible to buy desktop processors with six or eight cores, but they do not seem to be very common or popular. However, I will definitely spend some time working for a 4x speedup, so stalling there may not be such a shame. Even some inexpensive tablets are quad core now. But are we just pausing at four cores for another year or two, or is this going to be a stable sweet spot? If we are stuck at four, there should be a reason. A few random guesses:

  • Desktop workloads seldom benefit much from more than four cores.
  • Going past four cores puts too much of a squeeze on the number of transistors available for cache memory.
  • Above four cores, DRAM becomes a significant bottleneck.
  • Above four cores, operating systems run into scalability problems.

None of these limitations is fundamental, so perhaps in a few years four cores will be low-end and most workstations will be 16 or 32?

Cyber War

I recently read Richard Clarke’s Cyber War. Although I didn’t learn anything new on the technical side, that isn’t the focus of the book. Clarke’s main agenda is to build awareness of the uniquely vulnerable position that the United States finds itself in as well as proposing national policies that might lead to a more favorable position for the USA as well as a more stable situation for everyone. Although I know next to nothing about Clarke, over the course of the book I learned to admire his blunt opinions and the broad perspective he has developed as a long-time Washington insider. This book is a couple of years old, and therefore misses out on recent developments such as Stuxnet. Even so, I’m not aware of a better high-level introduction to the policy issues. It’s worth reading as a way to understand some of the broader implications of computer (in)security.

Discovering New Instructions

Sometimes I wonder what instruction sets are supposed to look like. That is, what instructions would there be if computers were redesigned by smart people who understood our fabrication capabilities and who knew what we wanted to accomplish using computers, but who didn’t care about backwards compatibility and who haven’t seen our architectures? We can get little glimpses into that world by looking at network processors, DSPs, GPUs, ISA extensions like SSE4 and NEON, extensible processors like Tensilica’s, and others. But still, these are too rooted in the past.

Although the machine code emitted by our compilers is inextricably tied to our processors, perhaps this code can still be leveraged to discover new instructions. As a thought experiment, let’s start with a collection of executables whose performance we care about. Preferably, some of these will have been compiled from programs in Haskell, OCaml, and other languages that are not well-represented in today’s benchmark suites. We’ll run these programs in a heavily instrumented execution environment that creates a dynamic dataflow graph for the computation; the excellent Redux paper shows how this can be done. Next, we’ll need to clean up the dataflow graphs. First, we rewrite processor-specific operations (condition code dependencies, CISC instructions, etc.) into a simpler, portable form. Next, we optimize away as much dead, redundant, and vacuous code as possible; including, hopefully, all irrelevancies such as stack frame manipulation, dynamic linking, and garbage collection. The result — perhaps — will be something beautiful: the essence of the original computation, stripped of all sequential constraints, processor-specific operations, and bookkeeping. Of course this dataflow graph has some limitations. First, it only encodes the meaning of a single execution of the program. Second, it encodes a lot of incidental facts about the computation such as the bitwidths of all integer operations, the specific hashing methods used, etc. We’ll just have to live with these problems. The Redux paper contains a great example where factorial codes written in C, in Haskell, and in a stack machine are shown to all produce basically the same dataflow graph.

So now we have a collection of giant dataflow graphs: one for each execution of each program that we’re interested in. Our goal is to design an instruction set that can compute these dataflow graphs. Trivially, this can be done by partitioning the graphs into very small units of computation corresponding to a RISC instruction set. But that ISA is boring and won’t show any performance wins. To do better we’ll use a search-based strategy to find subgraphs that:

  • occur a large number of times across the collection of dataflow graphs — these are operations that are executed frequently by our workloads
  • contain a lot of parallelism — making them good candidates for hardware acceleration
  • contain a lot of internal symmetry — supporting SIMD-style execution
  • have a small number of dynamic inputs and outputs
  • rely on a small number of constants
  • do not contain dependency chains that are too long — we don’t want to create instructions that are too slow

I think this can be done; none of these properties seems particularly difficult to test for. The major problem necessitating cleverness will be the huge size of the dataflow graphs. We’ll end up with a list of candidate instructions ranked by some fitness function, such as performance or code density. We can build as many of these into our new ISA as we have hardware budget for.

Would this method discover saturating arithmetic instructions when applied to signal processing codes? Would it find clever ways to optimize bounds checks and exception handling in high-level programming programming languages? It’s possible (though I’d be super disappointed) that the new machines are just incidental variations on existing RISC and CISC designs. If this happened, I would suspect that we had failed to abstract away a sufficient number of processor artifacts. Or, perhaps it was a mistake to compile our computations to an existing machine architecture before building the dataflow graphs. Rather, perhaps we should start with a source-level program and its operational semantics, unrolling it into a dataflow graph without any compilation to machine code. This avoids ties to our existing processors, but also risks coming up with graphs that are very hard to map back onto actual hardware. Of course, many languages don’t even have a real semantics, but researchers are diligently working on that sort of thing. An even more difficult option would build up the functional representation of a source program (or executable) without running it, but this has the disadvantage of losing the “profile data” that is built into a dynamic dataflow graph — we’d need to add that in separately.

An aspect of this exercise that I find interesting is that gives insight into what our processors really do. Many years ago (I don’t have a reference handy, unfortunately) a study showed that computers spend most of their time in sorting algorithms. That cannot be true anymore — but what does the average mobile phone do? What about the average supercomputer chip? The average machine in Google or Amazon’s cloud? Of course we know the answers at a high level, but are there surprises lurking inside the computations? I would expect so — it’s tough to take a close look at a complicated system and not learn something new. Are there new instructions out there, waiting to be discovered, that can help these workloads execute more efficiently? I have to think so, at least for for workloads that are not well-represented in Intel’s, AMD’s, and ARM’s benchmark suites.

Can Simplicity Scale?

Software has gotten really big, with many systems — even, apparently, cars — running into the hundreds of millions of lines of code. The drawbacks of code bases this large are numerous: they are hard to understand, hard to modify, hard to test, and virtually guaranteed to contain huge numbers of bugs. My understanding is that up to this point, we have survived principally by burying mountains of code under abstraction layers, hiding much of the complexity from the next layer in the software stack. This is fine but it only gets us so far: large codes tend to export quirky, unclean abstractions and of course bugs cause even more abstraction leaks. We should be exploring alternative approaches that might be able to radically reduce the size of our code bases.

This annual report describes an ongoing effort to implement Frank, a full software stack for an interactive desktop machine, using less than 20,000 lines of code. This is a great project and I love the fact that they’ve made an annual report — typically an internal-only document required by a funding agency — publicly available. Another nice aspect of this project is that creating an innovative GUI is explicitly not a goal; they simply want to mimic existing systems using 1000 times fewer lines of code.

The technical approach, in a nutshell, is to design a number of domain-specific programming languages, each suited to concisely expressing some part of the system. For example, consider this text from the report which describes the core of Frank’s windowing system, which is written in the Nile language:

The several dozen standard compositing rules, shading, stroking, gradients, sampling, shadowing, etc.—457 lines in total—were written in Nile and debugged to make a working graphics system, strong enough to be used for Frank, and to do all the graphics required for personal computing (and hence this report).

The Nile code can be found here. What is really being talked about — thought the report doesn’t use the term — is executable specifications. Specifications describe how a system should operate; they are written in English, mathematics, pseudocode, etc. and in general they can’t be executed efficiently (or at all). On the other hand, if the domain and the specification language are constrained, it should be possible to create executable specifications. Even in cases where the simple version of the specification cannot be executed efficiently, this approach supports the separation of optimizations from the core specification, making it easier to reason about their correctness.

Although I love this research project, at some point we have to ask ourselves if the basic goal — a 1000x reduction in lines of code, or a full system in 20 KLOC — will survive contact with the enemy. To play devil’s advocate for a moment, I suspect that a minimal OS + GUI + applications could be written in 20,000 lines of code even in C++ if the design was careful, the feature set limited, and the implementers talented. To continue with that line of thought: How much of Frank’s elegance will survive performance tuning, porting to many platforms, feature creep, internationalization and related hacks, mediocre developers, security hardening, and all of the other things that happen to software as it is pushed into production and then used by real people to accomplish complex tasks?

My guess is that a significant fraction of Frank’s gains go away in the face of real-world engineering concerns. However, even if we are left with a 10x or 100x reduction in code size, instead of the original 1000x, the exercise is worthwhile. The thing that worries me most about pushing a system like Frank into the real world is that the hidden costs are larger than we might hope. Let’s make an analogy with microkernels. Speaking abstractly, there is much to love about, for example, a 10 KLOC kernel that has been proved correct. However, people whose judgement I trust have said that (1) debugging code running on a microkernel can be extremely hard and (2) creating high performance code requires very talented developers. The first problem stems mainly from concurrency, the second from the fact that in a microkernel-based system, an arbitrary piece of data is no longer a single pointer indirection away. It seems that a Frank-like system is likely to have similar problems. Creating a clean, coherent, and efficient system using a big pile of little languages, some of them highly mathematical, probably requires serious talent. In contrast, if I’m a medium-grade developer and you give me a big, crappy mountain of C++, I can just start hacking and probably I’ll eventually get the desired effect — even if it is fragile and impossible to maintain. The is a classic worse-is-better situation — which has nothing to do with “worse” or “better”, but rather is about the conflict between two value systems: one which prizes cleanliness and clarity, the other which values pragmatic code that works efficiently in practice and can be easily evolved to solve new problems.

In summary, I desperately hope that we won’t be seeing, in a few years, 50 billion lines of code on the desktop and in the automobile. This future seems all too plausible and we should be actively working to avoid it. Approaches such as Frank are promising. The Racket project has a significant amount of overlap with Frank (though its top-level goals are different, or at least differently stated); see for example the recent CACM article by my colleague Matthew Flatt.

Towards Tinkers

The heroes of Vernor Vinge’s The Peace War are members of a scattered society of tinkers who — without any real industrial base — manage to develop and produce very high-tech devices including fast, small computers. I’m trying to figure out how realistic this is.

The software side seems entirely feasible. Today’s open source community has shown that small groups of talented, motivated people can create enormously sophisticated artifacts.

On the other hand, hardware is a problem. Vinge’s tinkers cannot exist in today’s world where fast computers are made only in billion-dollar fabs. Perhaps it will become possible to grow sophisticated chips using hacked bacteria, or maybe new fabrication technologies such as 3D printing will evolve to the point where they can produce realistic large-scale integrated circuits.

An end point for 3D printing technology is the Diamond Age‘s matter compiler, which can produce almost anything based on a plan and a “feed” of energy and atoms. But of course in that book, the feed is a centrally controlled resource. The “seed” technology from the end of The Diamond Age is tinker-friendly, but very far-fetched for now.

The recent near-doubling of hard disk prices due to flooding in Thailand shows how fragile our supply chains are. It’s nice to think that alternate versions of some high-tech products could be produced by small, autonomous groups of people.

Online University

Yesterday someone in my department’s main office got a request from a student to receive credit for taking the now-infamous free online AI course from Stanford. It is routine for a university to award transfer credit for a course taken at a different school, but this case is trickier since a student taking the AI course isn’t enrolled at Stanford and doesn’t get credit there. This post — which will be disorganized because my thinking on this subject is not yet organized — looks at what the Stanford course, the Khan academy, MIT’s Open Courseware initiative, and related efforts might mean for the future of college education.

Will there be a single, best course filling every niche, and everyone just takes that course? The analogy I’d like to make is between online courses and textbooks. Most subject areas are not dominated by a single textbook, but there is usually a small collection of textbooks that, together, are used as a basis for probably 80% of the courses. Personally I’d much rather learn from a textbook than from an online course — listening to someone talk is exceptionally inefficient. Why didn’t mass-market textbooks wipe out universities sometime during the 20th century? Because, of course, taking a class adds value beyond what can be found in the book. This value takes many forms:

  • A course comes as part of a broader “college experience” that many people want.
  • A course is part of an eventual degree that serves as a kind of certification.
  • Instructors are available to provide additional help.
  • Putting classmates in close proximity creates a sense of community and at least in some cases promotes learning.
  • A course is often part of a curriculum that has been designed in an integrated way.
  • A course serves as a forcing function, making it more difficult to put off learning the material.

I think we have to accept the conclusion that universities as we understand them today will be destroyed more or less to the extent that these sources of value can be provided by online education.

Let’s look at a couple of extremes. First, a course like Calculus I — a big lecture course at most universities. The experience of trying to learn integration while sitting in a big, crowded lecture is so bad that watching the lecture online almost seems attractive. It’s not hard to imagine these courses going away over the next 20 years. There seem to be various possibilities for how this will play out. First, a big university could offer an online version of Calc I, but this is very inefficient because only a few hundred or thousand people take it each year. Rather, courses like this will be handled by a few large organizations (companies or forward-thinking universities) and most institutions will simply contract out Calc I by giving some fraction of the tuition to the course provider. Course providers will make money by scaling — first through outsourcing and increasingly through AI-based techniques for assisting and assessing students. My fear is that these online courses will suck very badly, in the same way that so many web applications suck today. However, realistically, the not-online Calc I course I took 20 years ago sucked too: lectures were boring and recitation was at 7:30am with a TA I truly could not understand.

At the other extreme from big service courses, we have things like the “Senior Software Project” course offered by many departments or the Android Projects class that I’m teaching now. These have a large amount of instructor involvement, a large amount of in-person interaction between students, grading cannot easily be automated, etc. I don’t want to say that online versions of these classes are impossible, but certainly they would have a very different character than the current versions. These courses represent the part of a college education that directly descends from the old apprenticeship system and it would — in the long run — be a big problem if this part of the college experience went away. Of course the most serious students would still apprentice themselves through hackerspaces, internships, and such — but people in for example the 50th through 80th percentiles would likely be left poorly prepared for their future careers by an online-only education.

The picture I am painting is that at least in the near term, universities and traditional college degrees survive, but some of their bread and butter gets eaten by highly scalable providers of low-level courses. There will be fierce competition among providers — similar to the current competition between textbook providers, but the stakes will be higher. As we move forward, some fraction of students — in particular, non-traditional students and those who otherwise don’t want the traditional college experience — will move towards online-only degree programs. At first these will provide an inferior education and therefore they will be sought out by students who just cannot make regular classes work, or who are primarily interested in a degree for its own sake. Perhaps, as time passes, telepresence and related technologies will manage to become solid enough that a real education can be attained online.