Can Simplicity Scale?

Software has gotten really big, with many systems — even, apparently, cars — running into the hundreds of millions of lines of code. The drawbacks of code bases this large are numerous: they are hard to understand, hard to modify, hard to test, and virtually guaranteed to contain huge numbers of bugs. My understanding is that up to this point, we have survived principally by burying mountains of code under abstraction layers, hiding much of the complexity from the next layer in the software stack. This is fine but it only gets us so far: large codes tend to export quirky, unclean abstractions and of course bugs cause even more abstraction leaks. We should be exploring alternative approaches that might be able to radically reduce the size of our code bases.

This annual report describes an ongoing effort to implement Frank, a full software stack for an interactive desktop machine, using less than 20,000 lines of code. This is a great project and I love the fact that they’ve made an annual report — typically an internal-only document required by a funding agency — publicly available. Another nice aspect of this project is that creating an innovative GUI is explicitly not a goal; they simply want to mimic existing systems using 1000 times fewer lines of code.

The technical approach, in a nutshell, is to design a number of domain-specific programming languages, each suited to concisely expressing some part of the system. For example, consider this text from the report which describes the core of Frank’s windowing system, which is written in the Nile language:

The several dozen standard compositing rules, shading, stroking, gradients, sampling, shadowing, etc.–457 lines in total–were written in Nile and debugged to make a working graphics system, strong enough to be used for Frank, and to do all the graphics required for personal computing (and hence this report).

The Nile code can be found here. What is really being talked about — thought the report doesn’t use the term — is executable specifications. Specifications describe how a system should operate; they are written in English, mathematics, pseudocode, etc. and in general they can’t be executed efficiently (or at all). On the other hand, if the domain and the specification language are constrained, it should be possible to create executable specifications. Even in cases where the simple version of the specification cannot be executed efficiently, this approach supports the separation of optimizations from the core specification, making it easier to reason about their correctness.

Although I love this research project, at some point we have to ask ourselves if the basic goal — a 1000x reduction in lines of code, or a full system in 20 KLOC — will survive contact with the enemy. To play devil’s advocate for a moment, I suspect that a minimal OS + GUI + applications could be written in 20,000 lines of code even in C++ if the design was careful, the feature set limited, and the implementers talented. To continue with that line of thought: How much of Frank’s elegance will survive performance tuning, porting to many platforms, feature creep, internationalization and related hacks, mediocre developers, security hardening, and all of the other things that happen to software as it is pushed into production and then used by real people to accomplish complex tasks?

My guess is that a significant fraction of Frank’s gains go away in the face of real-world engineering concerns. However, even if we are left with a 10x or 100x reduction in code size, instead of the original 1000x, the exercise is worthwhile. The thing that worries me most about pushing a system like Frank into the real world is that the hidden costs are larger than we might hope. Let’s make an analogy with microkernels. Speaking abstractly, there is much to love about, for example, a 10 KLOC kernel that has been proved correct. However, people whose judgement I trust have said that (1) debugging code running on a microkernel can be extremely hard and (2) creating high performance code requires very talented developers. The first problem stems mainly from concurrency, the second from the fact that in a microkernel-based system, an arbitrary piece of data is no longer a single pointer indirection away. It seems that a Frank-like system is likely to have similar problems. Creating a clean, coherent, and efficient system using a big pile of little languages, some of them highly mathematical, probably requires serious talent. In contrast, if I’m a medium-grade developer and you give me a big, crappy mountain of C++, I can just start hacking and probably I’ll eventually get the desired effect — even if it is fragile and impossible to maintain. The is a classic worse-is-better situation — which has nothing to do with “worse” or “better”, but rather is about the conflict between two value systems: one which prizes cleanliness and clarity, the other which values pragmatic code that works efficiently in practice and can be easily evolved to solve new problems.

In summary, I desperately hope that we won’t be seeing, in a few years, 50 billion lines of code on the desktop and in the automobile. This future seems all too plausible and we should be actively working to avoid it. Approaches such as Frank are promising. The Racket project has a significant amount of overlap with Frank (though its top-level goals are different, or at least differently stated); see for example the recent CACM article by my colleague Matthew Flatt.

28 responses to “Can Simplicity Scale?”

  1. Does the 20kLOC include the implementation of the languages and compilers? It sounds to me like they are just building yet another system with yet another layer of abstraction in it. And this time, they are adding a language jump in the process.

  2. The 20,000 lines is supposed to contain everything including compilers, though I think they still rely on some external components.

  3. I might be speaking from a position of privilege, but of the reasons you listed for lack of elegance (performance tuning, porting to many platforms, …), I don’t accept “mediocre developers” as a valid excuse.

    Disciplines such as engineering and medicine had phases of their history characterized by shoddy practitioners, but professionalism helped weed out that problem. Software development should strive for a similar outcome.

    Yes, writing robust software is hard and mediocre developers are plentiful, but those facts shouldn’t burden the world with crappy software forever. As more people become code literate, the clients of software can afford to choose the best. Mediocre developers will find avenues where their true talents are useful, rather than drafted into currently-lucrative careers developing shoddy software. We should accept that programming software systems is a task beyond the skills of mediocre developers, similar to our current acceptance that surgery is a task beyond mediocre medical practitioners.

  4. I would point to the MiniSat project as a piece of software that “actually won” as an example of succinct software design.

    I would suspect that a large amount of software bloat (from personal experience) is due to this common scenario (note that points one and two are ultimately “probably false,” but let’s assume it happens that way):
    1) A long time ago, somebody had a small idea, they talked to a programmer
    2) The programmer implemented this small idea correctly, with a clean bit of code to implement what was necessary.
    3) This programmer moves away or moves up in the company hierarchy
    4) The program runs for a while, and changes are necessary.
    5) A new programmer is assigned to the job.
    6) Rather than reading, understanding the code, and properly adding to it, the young programmer spends a few hours reading the old code and says “this is horrible and can’t do what we want, let’s do something different.” Except they don’t start from complete scratch, they leave around some remnant of the original codebase, though probably using it incorrectly…
    7) Now the software has duplicated functionality, interfaces, etc… making the benefit / hours to understand ratio even less, giving new programmers — when this process is repeated — less incentive to understand what’s going on rather than discard the code and try to do it “their own way.”
    8) Repeat.

    The first time I ever worked on a large-ish software project I inhereted a bunch of code, threw most of it out, reinvented it, and then thought to myself ‘hey, this looks kind of like what the last guy did…’ and realized I had made a big mistake.

    (I have heard of professors teaching their students around this problem by forcing them to use previous parts of their project in subsequent assignments without changing interfaces or implementation, but I’m not convinced that doesn’t force an undue amount of hackery around bad design in previous parts later on, which is probably not a good lesson to learn.)

  5. @Sharkey: the world right now has an insatiable appetite for software. Until that changes, there will be more jobs than there are “above mediocre developers” (I don’t think the skills distribution of the whole population is going to swing much). As long as that is true, there will be people willing to pay for the best developer they can get, even if that is a mediocre developer.

  6. “The specification -is- the language” sounds like something the 4GL shysters would have come up with in the ’90s.

  7. I’m by no means an expert on this, but what do you think about Xmonad? Written in around 1k lines of Haskell, the developers put a lot of emphasis on making sure the code is ‘correct’ using static checking and property based testing.

  8. @Magnus: for a more positive twist, try seeing it in the light of the Curry–Howard correspondence (proofs as programs) which is deep and beautiful and not at all risible (not that I’m trying to imply this is the way to program, mind you).

    The phrase “the conflict between two value systems: one which prizes cleanliness and clarity, the other which values pragmatic code that works efficiently in practice and can be easily evolved to solve new problems” really irks me, because it implies these systems are incompatible or that you have to “pick a side” somehow. What we should go for is to make cleanliness and clarity pragmatic — which probably takes a combination of weeding out the programming chaff and making the means for achieving cleanliness and clarity more pragmatic themselves. That we’re not seeing it in practice yet doesn’t demonstrate that it’s fundamentally impossible, just hard.

    Of course this is all armchair fantasizing and I’m not exactly hard at work myself to make this a reality (I’m your average corporate programmer “embedded in pragmatica”), just wanted to vent.

  9. It doesn’t seem that there is any difference, aside from syntactic sugar, between building a windowing system using a specially-designed graphics compositing language (which is what Nile appears to be), as opposed to building this system in C++ using a graphics API. The latter approach, of course, is the boring, standard.

  10. I think microkernels are not the right analogy. Microkernels aren’t trying to replicate the whole kernel, instead they’re explicitly trying to avoid writing the whole kernel. That means that something like Frank, which is trying to replicate the stack as it is, will have very different properties.

    I also think that your contrast between “creating a clean, coherent, and efficient system” using little languages and “a big, crappy mountain of C++, I can just start hacking” is not comparing apples to apples at all. Creating “clean, coherent, and efficient systems” with C++ (or anything else) is incredibly hard work, which requires a lot of talent.

  11. Hi Sam, the microkernel analogy isn’t about microkernels, it’s about microkernel-based operating systems that provide roughly the same functionality as a monolithic OS but are decomposed quite differently. You can’t just avoid writing all the drivers, filesystems, network stacks, etc — but you can, if you want, run them in separate threads and address spaces. But it’s hard to debug this and make it efficient. I still like the analogy, YMMV.

  12. > Software has gotten really big

    And it’s just the start:

    You should really better qualify your claims “(1) debugging code running on a microkernel can be extremely hard and (2) creating high performance code requires very talented developers” I think that those part are only valid for OS code, not for userspace code (except that system calls are probably a bit slower).

    As for the STEPS project, I’m quite skeptic: I agree that it is probably to have much better tools for programming than that we have now which would give much better result, but 20,000 lines seems to me as a pipedream..
    I’d love to be wrong of course!

  13. Reno, I can’t quantify the claims; these are things I’ve learned from people I believe. But yes, it only applies to kernel mode code.

    Lalith I had not seen Xmonad. I love to see people tackling realistic systems-ish tasks in languages like Haskell. Next maybe they can replace the X server.

  14. Kris, what you say rings true. Once code reaches a certain size and level of entanglement, I think these problems are very hard to escape without simply throwing it all away.

  15. This might be just a silly polemic, but consider the hypothetical scenario where static verification becomes practical for any code bases.

    Currently, there’s two major reasons to write fewer lines of code rather than more lines of code: 1) we can run fancier static verification which does not scale for larger projects 2) we can wrap our heads more easily around fewer lines of code. If we were to assume that 1) is no longer a problem (that is, if humanity “solved the verification problem”. Now, I’m sure there’s some full-employment theorem for verifiers like there is for compilers, but bear with me), would you be OK with the resulting larger code bases (since there’d be less pressure to write small code)?

    Does this argument anything about where software engineering tool support should go, besides verification?

    (Maybe stuff like this even already exists: for example, should we be looking at software that suggests LOC-reducing refactorings?)

  16. I have found that very large systems (more than 200,000 lines) are almost never the result of a single person with a single design vision. A system that big is beyond the capabilities of most people to keep in mind at one time. Inevitably, dark corners appear, and in these dark corners small germs of accidental complexity take rise, and they soon replicate themselves, leading to accidental complexity all over the place. Soon the whole system is suffering from cancerous lesions, each one is difficult to remove on its own.

    The only effective ways I have seen to avoid this cancer are:

    1. Think hard about design first.
    2. Eradicate small tumors as soon as they start.
    3. Eliminate all but the most important dimensions of extensibility.
    4. After building a complete system that fills the requirements and works, throw it out and start over, this time designing for simplicity.

    Sometimes strategy 4 needs to be done repeatedly.

    I think today’s developers are often victims of a cargo-cult culture of design pattern and extensibility overload, which is where I have found that #3 is particularly important to learn, usually the hard way.

  17. Carlos, you anticipated a post I’m still figuring out how to write.

    Ben, I hear you. Unfortunately I think this also applies to interfaces, not just the code bases themselves.

  18. I wouldn’t go so far as to say programming language cannot matter. But languages like C or C++ should for the most part be capable of expressing, in a generic form, all the relationships between the graphics objects that Nile is capable of expressing more succintly.

    Doing this in C, via function calls and API’s, is cumbersome and loquacious compared to Nile. However, it would be helpful if they could explain in what way, specifically, Nile offers more expressiveness.

  19. Thinking about this more, here is an example in which one could argue that C++ does not have the required expressiveness. Suppose you are doing an application involving a lot of multiprecision arithmetic. If you design a language with a first-class multiprecise type, then the compiler may be able to perform optimizations on the code. This may allow you to express your program in an essentially more succint way than in a general high-level language.

    A language like C++ will never understand the semantics of the multiprecise type, and will basically treat them as black boxes.

  20. A major problem for simplicity is external standards. I notice that in Frank they’ve implemented an ‘alternative’ web, presumably to avoid this problem. Though they say they can import and export ODF – I’ll be impressed if that’s included in the 2KLoC budget.
    Standards have got bigger over the years. You can compare the complexity of, say, MPEG1 video to HPEG2, H264 and the upcoming H265 (which are in the same line of evolution).

    On the other hand, some things have got better. I remember when my mother had a job as a ‘maintenance programmer’, the managers – or maybe the customer – thought the best way to keep the system reliable was to restrict the textual size of the change for any given fix. I remember her bringing home a printout of some loop that had obviously had a few such fixes applied to it and was now incomprehensible, and trying to figure out how to apply *her * fix. At least that idea seems to have died out.

  21. I think that the key problem lies in “How much of Frank’s elegance will survive performance tuning, porting to many platforms, feature creep, internationalization and related hacks, mediocre developers, security hardening, and all of the other things that happen to software as it is pushed into production and then used by real people to accomplish complex tasks?”

    If you take a problem, write a specification for it, write some code that meets that specification, *then* add performance tweaks, porting, feature creep, security hardening, i18n, and so on, you’ll get a tangled mess.

    Code doesn’t evolve well. IMHO, when a software module isn’t fit for purpose, it should probably be rewritten rather than hacked with – unless it’s a very simple module with a very simple change, so the entire change can be understood as a whole.

    How can we make this more possible? By splitting software into smaller independent modules, with better-defined interfaces between them so they can change more independently. How to make that happen is an open research area…

    Also, I don’t think we should complain about mediocre developers; we should build software tools so that us awesome developers can still use them well on a bad day, the the “mediocre” developers will do fine 😀

  22. “I love to see people tackling realistic systems-ish tasks in languages like Haskell. Next maybe they can replace the X server.”

    Some people are already trying to replace the X server, with Wayland. It looks a lot smaller and simpler than X so it’s probably a lot easier to implement in Haskell if you want, but it’ll be a huge win for simplicity even in C.

    One good strategy, which Wayland is taking advantage of, is to simply take out the trash. Nobody needs or wants 1980’s monochrome stippling patterns or font descriptions in a display server, but they’re part of X so anybody writing an X server needs to implement them. Wayland’s philosophy is that you already have graphics drivers in your kernel, and rendering pipelines in your graphics library, and so on, so all you really need in the middle is a compositor.