Skip to content

Booster Test

Ever since learning that the space shuttle booster motors were manufactured and tested at ATK in Promontory Utah — not too far from where I live — I wanted to see one of the tests. I didn’t manage to do that before the shuttle program was shut down, but today I got to see something better: a test of an SLS booster, which is about 25% more powerful than an STS booster and more than twice as powerful as one of the big F-1 engines from the Saturn V.

Here’s a close-up video. On the other hand, this one shows what the test was like from the viewing area, in particular the 8 seconds it took the noise to reach us. The sound was very impressive, with enough low-frequency power to make my clothing vibrate noticeably, but it was not anywhere close to painfully loud. The flame was, however, painfully bright to look at. The nozzle was being vectored around during the test (I hadn’t realized that the solid rockets participate in guidance) but that wasn’t easy to see from a distance.

NASA socials give some inside access to people like me (and you, if you live in the USA and want to sign up next time) who have no official connection to the space program. Yesterday we got to tour the plant where the boosters are made. It was great to learn about techniques for mixing, casting, and curing huge amounts of propellant without getting air bubbles or other imperfections into the mix and without endangering workers. The buildings in this part of ATK have escape slides from all levels and are surrounded by big earthworks to deflect potential explosions upwards. It was also really cool to see the hardware for hooking boosters to the main rocket, for vectoring nozzles, and things like that. Alas, we weren’t allowed to take pictures on the tour.

ATK’s rocket garden at sunrise:

And the main event:

Adandoned Mineshaft

Due to my 8-year-old’s obsession with Minecraft, the abandoned mineshafts found in the game are an everyday topic of discussion around the house. He is saving up to buy a pickaxe — no joke. Since we needed a day trip for the long weekend, I thought we’d visit some actual mines in the Silver Island Mountains near the Utah-Nevada border. We followed an old road most of the way up a minor mountain and found the remains of either a processing facility or living quarters, and several mine shafts higher up. One of these apparently goes all the way through the mountain but we didn’t do much more than poke our heads in, since it was difficult to gauge how dangerous it was. This area was mined from the 1870s through the 1930s. Since the pathway connecting the mine shafts with the highest vehicle access was too narrow and steep for vehicles, we were forced to assume that ore was transported downhill using some combination of human and animal power — the level of physical effort implied here seemed to be lost on the 8-year-old, who wants a pickaxe more than ever. We saw a pair of pronghorn antelope — but failed to get decent pictures.

Lamus Peak:

Built-up trail used to move ore downhill through rugged country:

Mine shaft:

Another one, with some nice copper ore right at the entrance:

And a lot of nothing in the distance:

Instruction Synthesis is Fun and Weird

Synthesis is sort of a hot topic in PL research lately. It basically means “implement a specification automatically.” Of course, at some level this isn’t very different from what compilers have been doing for ages, if we consider the source language program to be the specification, but when we’re doing synthesis the connotation is that the specification isn’t directly implementable using standard compiler techniques.

As of recently, we have an LLVM synthesizer for Souper. The basic method is from this excellent PLDI paper. In brief, you give the synthesis algorithm a specification and a library of components and it either gives you a way to wire together the components into a circuit that implements the specification or else it fails. Some other time I might explain the algorithm in detail, but here I wanted to give an example where we’re trying to figure out if a 64-bit word contains 16 repetitions of the same nibble value (this computation figured into many nibble sort submissions).

The specification in C is:

int nibs_same(unsigned long word) {
  unsigned long n00 = (word >> 60) & 0xf;
  unsigned long n01 = (word >> 56) & 0xf;
  unsigned long n02 = (word >> 52) & 0xf;
  unsigned long n03 = (word >> 48) & 0xf;
  unsigned long n04 = (word >> 44) & 0xf;
  unsigned long n05 = (word >> 40) & 0xf;
  unsigned long n06 = (word >> 36) & 0xf;
  unsigned long n07 = (word >> 32) & 0xf;
  unsigned long n08 = (word >> 28) & 0xf;
  unsigned long n09 = (word >> 24) & 0xf;
  unsigned long n10 = (word >> 20) & 0xf;
  unsigned long n11 = (word >> 16) & 0xf;
  unsigned long n12 = (word >> 12) & 0xf;
  unsigned long n13 = (word >>  8) & 0xf;
  unsigned long n14 = (word >>  4) & 0xf;
  unsigned long n15 = (word >>  0) & 0xf;
  return
    (n00 == n01) & (n00 == n02) & (n00 == n03) & (n00 == n04) &
    (n00 == n05) & (n00 == n06) & (n00 == n07) & (n00 == n08) &
    (n00 == n09) & (n00 == n10) & (n00 == n11) & (n00 == n12) &
    (n00 == n13) & (n00 == n14) & (n00 == n15);
}

I won’t bother to give the LLVM version of this but it’s not too pretty, and obviously there’s not much that standard compiler optimization techniques can do to make it prettier. On the other hand, there’s a much better way to implement this specification:

int nibs_same(uint64_t word) {
  return word == rotate_right(word, 4);
}

LLVM doesn’t have a rotate instruction so the best we’ll do there is something like this:

define i32 @nibs_same(i64 %word) #0 {
  %1 = shl i64 %word, 60
  %2 = lshr i64 %word, 4
  %3 = or i64 %1, %2
  %4 = icmp eq i64 %3, %word
  %5 = zext i1 %4 to i32
  ret i32 %5
}

The question is: Will Souper generate this code from scratch? Turns out it does not, but on the other hand it generates this:

define i32 @nibs_same(i64 %word) #0 {
  %1 = lshr i64 %word, 28
  %2 = shl i64 %1, 32
  %3 = or i64 %1, %2
  %4 = icmp eq i64 %word, %3
  %5 = zext %4 to i32
  ret i32 %5
}

Or in C:

(((x >> 28) << 32) | (x >> 28)) == x

If you take a minute to look at this code, you’ll see that it is not implementing a rotate, but rather something odd and not obviously correct. It took me a while to convince myself that this is correct, but I believe that it is. The correctness argument uses these observations: (1) every nibble is compared to some other nibble and (2) it is irrelevant that the bitwise-or is presented with a pair of overlapping nibbles. Here’s the picture I drew to help:

0000000000111111111122222222223333333333444444444455555555556666
0123456789012345678901234567890123456789012345678901234567890123
                            >> 28
                            000000000011111111112222222222333333
                            012345678901234567890123456789012345
                            << 32
00000011111111112222222222333333
45678901234567890123456789012345
                              |
                            000000000011111111112222222222333333
                            012345678901234567890123456789012345
                             ==?
0000000000111111111122222222223333333333444444444455555555556666
0123456789012345678901234567890123456789012345678901234567890123

Also Alive can help:

  %1x = shl i64 %word, 60
  %2x = lshr i64 %word, 4
  %3x = or i64 %1x, %2x
  %4x = icmp eq i64 %3x, %word
  %5 = zext i1 %4x to i32
=>
  %1 = lshr i64 %word, 28
  %2 = shl i64 %1, 32
  %3 = or i64 %1, %2
  %4 = icmp eq i64 %word, %3
  %5 = zext %4 to i32

Done: 1
Optimization is correct!

The drawback of Souper's code is that since it doesn't include a rotate operation that can be recognized by a backend, the resulting assembly won't be as good as it could have been. Of course if LLVM had a rotate instruction in the first place, Souper would have used it. Additionally, if we posit a superoptimizer-based backend for LLVM, it would be able to find the rotate operation that is well-hidden in Souper's synthesized code.

I find it kind of magical and wonderful that a SAT solver can solve synthesis problems like this. Souper's synthesizer is very alpha and I don't recommend using it yet, but the code is here if you want to take a look. Raimondas, my heroic postdoc, did the implementation work.

I'll finish up by noting that my first blog entry was posted five years ago today. Happy birthday, blog.

Update from Feb 10: A reader sent this solution which I hadn't seen before:

(x & 0x0FFFFFFFFFFFFFFF) == (x >> 4)

I think Souper missed it since we didn't include an "and" in its bag of components for this particular run.

Static Analysis Benchmarks

Many programmers would agree that static analysis is pretty awesome: it can find code defects that are very hard to find using testing and walkthroughs. On the other hand, some scientific validation of the effectiveness of static analysis would be useful. For example, this nice 2004 paper found that when five analyzers were turned loose against some C programs containing buffer overflows, only one of them was able to beat random guessing in a statistically significant fashion.

More recently, in 2014, some researchers at the Toyota InfoTechnology Center published a paper evaluating six static analyzers using a synthetic benchmark suite containing more than 400 pairs of C functions, where each pair contains one function exemplifying a defect that should be found using static analysis and the other function, while looking fairly similar, does not contain the defect. The idea is that a perfect tool will always find the bug and not find the not-bug, if you see what I mean. The results show that different tools have different strengths; for example, CodeSonar dominates in detection of concurrency bugs but PolySpace and QA-C do very well in finding numerical bugs. With a few exceptions, the tools do a good job in avoiding false positives. At this point I feel like I have to refer Coverity’s classic writeup about this kind of thing just because it’s awesome.

Shin’ichi Shiraishi, the lead author of the Toyota paper, contacted me asking if I’d help them release their benchmarks as open source, and I have done this: the repository is here, and here is a brief document describing the benchmarks. (I just want to make it clear that while I own the Github repo, the work here was entirely done by Shin’ichi and his colleagues — I added the simple autotooling, fixed a few portability problems in the benchmarks such as using intptr_t for integers that will be cast to and from pointer values, and did some other lightweight cleanup.) Anyway, I am very happy that Shin’ichi and his organization were able to release these benchmarks and we hope that they will be useful to the community.

Update from 2/25/15: Whoops — it looks like Coverity/Synopsys has a DeWitt clause in their EULA and based on this, they sent me a cease and desist notice. In accordance with their request, I have removed Shin’ichi’s group’s paper from the Github repo and also removed the link to that paper from this piece.

Nibble Sort Programming Contest

The problem is to sort the 4-bit pieces of a 64-bit word with (unsigned) smaller values towards the small end of the word. The nibble sort of 0xbadbeef is 0xfeedbba000000000. The function you implement will perform this sorting operation on a buffer of 1024 64-bit integers:

void nibble_sort(unsigned long *buf); 

I’ll give a small prize to the submitter of the entry that performs this operation the fastest on my Core i7-4770. If the winner makes non-trivial use of SIMD, I’ll give a prize to that entry and also to the fastest non-SIMD entry. I’ll use GCC 4.9.2 to compile your C99 code, which may use SIMD intrinsics and also inline assembly if you choose. There’s no requirement for portability. The machine runs Ubuntu 14.04 in 64-bit mode. If you require any command line options other than “gcc -std=gnu99 -O3″ you need to tell me that. You may assume that the buffer starts at the start of a cache line. The buffer will be loaded with uniformly chosen random values. Submit code by emailing me. Submit entries before the end of January 2015.

Here’s a slow (but hopefully correct) reference implementation takes about 180 380 microseconds to do the job:

int read_nibble(unsigned long w, int i) {
  assert(i >= 0 && i < 16);
  unsigned long res = w >> (i * 4);
  return res & 0xf;
}

void write_nibble(unsigned long *w, int i, int v) {
  assert(i >= 0 && i < 16);
  unsigned long mask = 0xf;
  mask <<= (i * 4);
  *w &= ~mask;
  unsigned long prom = v;
  prom <<= (i * 4);
  *w |= prom;
}

unsigned long nibble_sort_word(unsigned long arg) {
  for (int i = 0; i < 16; ++i) {
    int min = i;
    for (int j = i+1; j < 16; ++j) {
      if (read_nibble(arg, j) < read_nibble(arg, min))
        min = j;
    }
    if (min != i) {
      int tmp = read_nibble(arg, i);
      write_nibble(&arg, i, read_nibble(arg, min));
      write_nibble(&arg, min, tmp);
    }
  }
  return arg;
}

void nibble_sort(unsigned long *buf) {
  for (int i=0; i<1024; i++)
    buf[i] = nibble_sort_word(buf[i]);
}

Update from Feb 1: Ok, contest is over. Thanks for the very impressive submissions, folks! Provisional results with and w/o SIMD are below. These are with turbo boost turned off. I’m now working on the blog post explaining the results, which is going to take some time because there are a lot of solutions here and some of them are very clever. I’m also still working to secure permission to use codes so I can put as many as possible in Github. There are fewer entries in the no-SIMD category because many codes had explicit use of vectors. Please let me know if you think I’ve made any mistakes.

Update from Feb 2: There were a few mistakes and omissions. Results that I hope are final are below.

regehr@regehr-M51AC:nibble_sort$ ./go.sh 
              ns     ns / 8 bytes       entry name           errors
            1115             1.09       alexander4                0
            1238             1.21          arseny2                0
            1596             1.56          arseny1                0
            1755             1.71              uwe                0
            2630             2.57        pdewacht2                0
            4205             4.11         beekman2                0
            4466             4.36         beekman1                0
            4867             4.75       alexander3                0
            6084             5.94       alexander1                0
            6364             6.21         koorogi2                0
            8472             8.27             falk                0
            8520             8.32           jerome                0
           10331            10.09          vetter4                0
           10374            10.13           jepler                0
           10950            10.69       alexander2                0
           12096            11.81          vetter3               64
           12714            12.42         koorogi1                0
           14185            13.85              tom                0
           15532            15.17             rjk9                0
           16829            16.43          parnell                0
           16890            16.49        pdewacht1                0
           18804            18.36           chucky                0
           20041            19.57          burton2                0
           20649            20.17            nadav                0
           24908            24.32         bosmans1               60
           25104            24.52         bosmans2                0
           25486            24.89          vetter2                0
           28957            28.28             hans                0
           29928            29.23             anon                0
           30228            29.52            jrose                0
           31007            30.28           carlos                0
           32952            32.18            joris                0
           34562            33.75          vetter1                0
           45440            44.38             mats                0
           47511            46.40            frank                0
           50162            48.99            robin                0
           72762            71.06          grayson                0
           74465            72.72            rosen                0
           79752            77.88            payer                0
           92970            90.79          burton1                0
           94343            92.13           mentre                0
           95522            93.28             vlad                0
           97877            95.58           rogers                0
           99077            96.75             mike                0
          101913            99.52            bloom                0
          103945           101.51           jarkko                0
          109191           106.63            karim                0
          120928           118.09           justin                0
          160599           156.83           mikael                0
          200943           196.23           scotty                0
          416207           406.45              ref                0
regehr@regehr-M51AC:nibble_sort$ ./go.sh NOSIMD
              ns     ns / 8 bytes       entry name           errors
            8593             8.39           jerome                0
           10321            10.08           jepler                0
           10346            10.10          vetter4                0
           10959            10.70       alexander2                0
           12105            11.82          vetter3               64
           15537            15.17             rjk9                0
           16842            16.45          parnell                0
           18847            18.41           chucky                0
           19995            19.53        pdewacht1                0
           21631            21.12            nadav                0
           25516            24.92          vetter2                0
           29418            28.73             hans                0
           29928            29.23             anon                0
           30529            29.81            jrose                0
           32956            32.18            joris                0
           34286            33.48             falk                0
           41028            40.07          burton2                0
           42308            41.32          vetter1                0
           44415            43.37         bosmans2                0
           45480            44.41             mats                0
           50179            49.00            robin                0
           53189            51.94            frank                0
           71285            69.61          grayson                0
           73883            72.15            rosen                0
           79744            77.88            payer                0
           94780            92.56          burton1                0
           94970            92.74           mentre                0
           95509            93.27             vlad                0
           98658            96.35           rogers                0
           98922            96.60             mike                0
          101722            99.34            bloom                0
          104880           102.42           jarkko                0
          113147           110.50            karim                0
          119168           116.38           justin                0
          415529           405.79              ref                0

Update from Feb 3: Here’s the Github repo containing all of the entries — let me know how it looks.

Buying Into Open Source Security

If you were given the opportunity to spend USD 100 million over five years to maximally improve the security of open source software, what would you do? Let’s just assume that the money comes with adequate administrative staff to manage awards and contracts so you can focus on technical issues. A few ideas:

  • Bug bounties, skewed towards remotely exploitable bugs and towards ubiquitous infrastructure such as OpenSSL and the Linux kernel. To get rid of USD 100 M in five years, we’ll probably need to make the bounties very large by current standards, or else give out a lot of them.
  • Contracts for compatible rewrites of crufty-but-important software in safe languages.
  • Contracts for aggressive cleanup and refactoring of things like OpenSSL.
  • Contracts for convincing demonstrations of the security of existing codes, in cases where it seems clear that rewrites are undesirable or impractical. These demonstrations might include formal verification, high-coverage test suites, and thorough code inspections.
  • Research grants and seed funding towards technologies such as unikernels, static analyzers, fuzzers, homomorphic encryption, Qubes/Bromium-kinda things, etc. (just listing some of my favorites here).
  • Contracts and grants for high-performance, open-source hardware platforms.

This post is motivated by the fact that there seems to be some under-investment in security-critical open source components like Bash and OpenSSL. I’ll admit that it is possible (and worrying) that USD 100 M isn’t enough to make much of a dent in our current and upcoming problems.

Testing with Pictures

Testing code is fun and hard and looking at the problem in different ways is always good. Here’s a picture representing the behavior of a saturating subtraction operation, where the horizontal axes represent the inputs and the output is vertical:

And here are some of the functions handed in by my students in the fall:










The last one represents one of the most common failure modes: failing to account for the asymmetry where INT_MIN has no additive inverse. The thing that I like about these images is how glaringly obvious the bugs are.

Hey, who knew Gnuplot could do animated gifs now? I guess old dogs can learn new tricks.

(Note: I posted some of these a few years ago, but I like the pictures so much I wanted to do it again.)

Inversions in Computing

Some computer things change very slowly; for example, my newish desktop at home has a PS/2 port. Other things change rapidly: my 2010 iPad is kind of a stone-age relic now. This kind of differential progress creates some funny inversions. A couple of historical examples:

  • Apparently at one point in the 80s or 90s (this isn’t a firsthand story– I’d appreciate recollections or citations) the processor available in an Apple printer was so fast that people would offload numerical computations to their printers.
  • I spent the summer of 1997 working for Myricom. Using the then-current Pentium Pro machines, you could move data between two computers faster than you could do a local memcpy(). I’m pretty sure there was something wrong with the chipset for these processors, causing especially poor memcpy() performance, but I’ve lost the details.

What are the modern examples? A few come to mind:

Anyhow, I enjoy computing inversions since they challenge our assumptions.

Souper Results 2

The Souper superoptimizer has made some progress since my last post about it.

We wrote compiler drivers that usually reduce the problem of building a project with Souper to make CC=sclang CXX=sclang++. Souper now uses Redis to cache optimizations so that even if the initial build of a program using Souper is slow, subsequent builds will be pretty fast. We fixed several problems that were preventing Souper from building largish programs like LLVM and GCC. This works now and, as far as we know, Souper can be used to optimize arbitrary LLVM code.

Souper now understands the ctpop, ctlz, cttz, and bswap intrinsics. It no longer generates only i1 values, but rather synthesizes constant values of any width. Constant synthesis is not fast and it requires a solver with good support for quantifiers, currently only Z3 (synthesizing constants without quantifiers isn’t hard, we just haven’t implemented that yet). Here’s a list of constants synthesized while building LLVM with Souper. The left side of each line is the number of times the constant on the right side was synthesized. i1 constants dominate but it’s fun to see, for example, that Souper was able to synthesize the 64-bit value 90112 four times. Where did that come from?

Souper has two main use cases. First, application developers can use Souper directly to optimize code they are compiling. Second, LLVM developers can use Souper to learn about optimizations missed by the existing optimization passes. We’re trying to make it useful to both of these audiences.

To make Souper more useful for compiler developers, we implemented a C-Reduce-like reducer for Souper optimizations. This is necessary because Souper extracts and attempts to optimize pieces of LLVM that are as large as possible, meaning that its optimizations often contain extraneous material. A reduced optimization has the handy invariant that no path condition, UB qualifier (nsw, nuw, exact), or leaf instruction can be removed without breaking the optimization. We did some cross-checking between Souper and Alive, as a sanity check on both tools. Additionally, we convert each Souper optimization back into LLVM and run it through opt -O3 in order to weed out any optimizations that LLVM already knows how to do. For example, Souper loves to prove that icmp eq %0, %0 can be simplified to 1. This is not useful.

While building LLVM, ~16,000 Souper optimizations fire. Some of these optimizations are duplicates (presumably due to inlining and header inclusion); ~7000 of them are distinct. After reduction there are ~4000 distinct optimizations and LLVM does not know how to perform ~1500 of them. Even 1500 optimizations is lot of work to look through and of course not all of them matter. To help figure out which optimizations matter, we implemented two kinds of optimization profiling. The first is static profiling, which counts the number of times an optimization is applied at compile time. Implementing optimizations with a high static profile count would tend to reduce the size of the compiler’s generated code. Second, we implemented dynamic profiling, which counts the number of times each optimized piece of code is executed. This is accomplished by instrumenting the compiled program so that it dumps dynamic profile information to a Redis server using an atexit() handler. Implementing optimizations with a high dynamic profile count would tend to decrease the runtime of generated code. Of course, all standard caveats about profile-driven optimization apply here. Also keep in mind that Souper is extremely specific while compilers are less so: there is a many-to-one relationship between optimizations discovered by Souper and optimizations you would implement in LLVM. Therefore, it may well be the case that there are collections of low-ranking Souper optimizations that would rank highly if considered as a group, and that could all be implemented by a single LLVM transformation. We’ve experimented a bit with trying to automatically aggregate similar Souper optimizations, but so far I haven’t been too happy with the results.

If we take a Souper-optimized LLVM and use it to build SPEC CPU 2006, this is the optimization with the highest dynamic profile count; it is executed ~286 million times:

%0:i64 = var
%1:i64 = and 15:i64, %0
%2:i1 = eq 0:i64, %1
pc %2 1:i1
%3:i64 = and 7:i64, %0
%4:i1 = eq 0:i64, %3
cand %4 1:i1

The first four lines tell us that the arbitrary 64-bit value %0 is known to have zeros in its four bottom bits. The last three lines tell us that — of course — %0 has zeros in its three bottom bits. LLVM doesn’t understand this yet, leading to a lot of unnecessary conditional jumps.

Here’s the collection of Souper optimizations that are discovered while building LLVM/Clang/Compiler-RT r222538:

The clang binary from a “Release” build with Souper is about 800 KB smaller than the clang built without Souper. Please let us know about any bugs in the output above, including missed optimizations (but don’t tell us about missing vector, FP, or memory optimizations, we know that those are not supported yet). In the course of this work Raimondas ran across a Z3 bug; luckily he caught it by cross-checking Souper’s results using a different solver, instead of having to debug the resulting miscompilation.

The main thing that Souper does not do, that you would expect a superoptimizer to do, is to synthesize sequences of instructions. Much of our work over the last six months has been building infrastructure to support instruction synthesis, and almost all of that is now in place. Synthesis is our next major piece of work.

In the meantime, Peter has run Souper over libgo. I would like to build something a bit bigger such as Chromium. If you have a recipe for that, please drop me a line. I got as far as noticing that Chromium builds its own LLVM at which point my allergy to build systems kicked in. Integrating Souper into a build of the Rust compiler might also produce interesting results; it should be as easy as starting Redis and making sure our opt pass gets loaded in the right places.

Souper is by Peter Collingbourne at Google, by my postdoc Raimondas Sasnauskas, by Yang Chen at nVidia, by my student Jubi Taneja, by Jeroen Ketema at Imperial College London, and by me.

Partial Evaluation and Immutable Servers

Although I haven’t figured out exactly what immutability means for a server (I’m probably just being picky) the general idea of rebuilding a system from spec rather than evolving it with one-off hacks is very appealing. Lately I’ve been thinking about what could be accomplished if the system compiler were able to take advantage of certain kinds of immutability. One kind of technique that would be enabled is partial evaluation. Let’s look at a simple example starting with an integer power function I found on the web:

long powi(long x, long n) {
  assert(n >= 0);
  long  p = x, r = 1;
  while (n > 0) {
    if (n % 2 == 1)
      r *= p;
    p *= p;
    n /= 2;
  }
  return r;
}

This function compiles to 20+ instructions. On the other hand, the compiler is able to do considerably better for this special case:

long cubei(long x) {
  return powi(x, 3);
}

GCC’s output:

cubei:
   movq   %rdi, %rax
   imulq  %rdi, %rax
   imulq  %rdi, %rax
   ret

Here the C compiler has partially evaluated powi() with respect to the constant second argument. The assert() is gone, the loop is gone, etc. This is a very simple example. At the other extreme, people like to say that if you partially evaluate an interpreter with respect to a particular input, you get a compiler. Think, for a minute, about what kind of partial evaluator we would need to have in order to specialize a C interpreter with respect to the powi() code in such a way that we could honestly say that we’ve compiled it. The tool that would support this job is not so easy to create.

Ok, back to immutable servers. What we are looking for is programs in our server image that process immutable or constrained inputs. For example we want to try to show that:

  • A daemon, say Redis, is always started using the same configuration file
  • For a pair of programs that communicate through a pipe, only a small subset of the full set of commands is ever sent
  • Only a subset of the OS kernel’s system calls are invoked
  • A program (bash, hopefully) is never invoked at all

Next, we partially evaluate the system with respect to these constant or bounded inputs. If we do this properly, we would expect quite a bit of code handling general cases would fall away, leaving only the specific code needed for our server. This is basically just a big global tree-shaking operation.

Why would we do this? There are two reasons to cut away code and data that we don’t need. First, it reduces unnecessary attack surfaces. Second, it makes the resulting images smaller and faster. We can ship them around more easily and they use less RAM while running.

Partial evaluation is a very old idea, and the idea of applying it to systems software is not new either. Here’s a good piece of work, and here’s another one that I haven’t read carefully, but that seems reasonable at first glance. Why have these approaches not taken the world by storm? My guess is that it’s just difficult to get good results. In many cases we’re going to be dealing with strings and pointers, and it is very common to run into insurmountable problems when trying to reason about the behavior of programs in the presence of strings and pointers. Consider, for example, a Python script that makes a string using stuff it found in a file, stuff it got over the network, and a few regular expressions. What does the string do when we exec() it?

On the other hand, in the last decade or so SAT/SMT/string solvers have become very powerful, as have symbolic execution techniques. The cloud has created use cases for partial evaluation that did not exist earlier. Security is a worse problem than ever. Compilers are much better. Perhaps it’s time to try again. It is clear that we can’t just point the partial evaluator at our Docker image and expect great things. We’ll need to help it understand what parts of the system are immutable and we’ll also need to incrementally refactor parts of the system to make them cooperate with the partial evaluator. Anyway, research isn’t supposed to be easy.

I’ll finish up by mentioning that there’s a different way to get the same benefits, which is to assemble a system out of a collection of components in such a way that you don’t need a brilliant compiler to eliminate code that you didn’t mean to include. Rather, you avoid including that code in the first place. This is more or less Mirage’s design point. Both approaches seem worth pursuing.