Modern Dataflow Frameworks Wanted

Dataflow analysis, or static analysis, is a way to compute properties that hold over all possible executions of a program. For example, if a compiler can conclude that an expression always evaluates to the value 5, this fact can be used to avoid computing the expression at runtime.

The math behind dataflow analysis is probably one of the ten most important results in theoretical computer science. It supports a strong separation of concerns where a generic dataflow framework takes care of the bookkeeping and a collection of transfer functions implement the specific analysis.

Once a framework exists, writing analyses is easy and entertaining. It’s great to form a little hypothesis like “we could get rid of many array bounds checks by keeping track of integer intervals,” write a few transfer functions, and then analyze some large programs in order to validate or disprove the hypothesis. On the other hand, writing a dataflow analysis without a framework is a hellish amount of work.

I recently read this paper which evaluates a new idea for analyzing integer ranges. The authors implemented their analysis over the LLVM IR, which is nice since LLVM serves as a partial framework, permitting them to analyze some of the SPEC benchmarks. On the other hand, significant limitations of their analysis (intraprocedural, limited analysis of globals and pointed-to values) would seem to indicate that LLVM does not yet serve as a great framework for this sort of research. The result is that even though the authors of this paper have done the right thing, their evaluation remains tantalizing at best. My former student Nathan Cooprider had similar problems implementing dataflow analyses using CIL, which saved us probably 18 months of work and made his project feasible. Even so, CIL had numerous shortcomings as a dataflow framework and a lot of effort was required to get things working.

A good dataflow framework should:

  1. Contain front-ends for one or more real programming languages.
  2. Be open source and well documented.
  3. Expose a clean intermediate representation.
  4. Make easy things easy. A user should have to write only a minimal amount of code to support a simple static analysis such as detecting null pointers or determining signedness of integers.
  5. Make it easy to write at least one kind of static analysis. For example, an integer range analysis is quite different from a live variables analysis, and both are very different from a pointer analysis.
  6. Provide easy access to source-level information such as lines, columns, variable names, etc.
  7. Heavily optimize the program prior to analysis. One of the easiest ways to increase the precision of an analysis is to run it after a comprehensive collection of optimization passes (such as those supported by LLVM) has removed dead code, useless indirection, and other junk that tends to pollute analysis results.
  8. Permit analyses to learn from the results of other analyses.
  9. Provide good support for turning analysis results into assertions; Nathan probably couldn’t have worked the bugs out of his analyzers without doing this.
  10. Provide efficient fixpoint iteration algorithms.
  11. Make it easy to write transformations that consume analysis results.
  12. Provide transparent support for interprocedural and whole program analysis.

It is possible that Soot and Rose meet all of these requirements. CIL does not. I don’t think LLVM is too far off. It seems likely that good analysis frameworks for functional languages exist, but I’ve paid no attention.

One of the biggest holes that I see right now is a good unified framework for static analysis of languages such as JavaScript, Python, Perl, and PHP. The heavy reliance on strings, hashes, and dynamic types in these languages makes them a lot different (and in some ways a lot more difficult) than traditional compiled languages. I’m not totally sure that the differences between these languages can be abstracted away in a clean fashion, but it seems worth trying.

UPDATE:

Pascal Cuoq points out the Frama-C meets all of the criteria at least to some extent. Ed Schwartz points to CPAChecker which looks interesting though I know nothing about it.

An important criterion that I forgot to list is the ability of a dataflow framework to provide knobs for tuning precision against resource usage. Examples of this include configurable widening thresholds and tunable degrees of path and context sensitivity.

Capitol Reef Rock Art

Last weekend we spent a day poking around Pleasant Creek Canyon in Capitol Reef National Park, which contains a permanent stream and a lot of rock art. The art was left by the Fremont people, who lived in the area until about 700 years ago; it isn’t clear why they left (or died out) but climate change is suspected. Pictures towards the bottom of the page are from the following day in an unnamed hoodoo field just outside Teasdale, UT.

[nggallery id=60]

Book Review: The Shadow Scholar

Paying a professional to write an essay is probably one of the safest ways for a student to cheat, assuming the paid-for essay is not itself plagiarized. The premise of Shadow Scholar is that plenty of students are willing to pay for this kind of service and Tomar was happy to provide it—culminating in a great episode where he functions as a surrogate PhD student.

Although Shadow Scholar is very funny in places, Tomar’s tone is dominated by anger, cynicism, and contempt. He heaps scorn on entitled students who have plenty of money, little talent, and no scruples. At the same time, he makes it sound as if only a fool would spend time writing an essay that a professor will never see and a TA will at most skim. The students and faculty are not his only targets: institutionalized education and the declining American economy are also to blame. He adds:

I always had this Holden Caulfield-ish suspicion that everything was bullshit. I basically figured that the world is filled with frauds, and many of them are so worried about being figured out that they’ll never stop to scrutinize you. I realized I could fake my way through anything.

If this is his starting point, it’s not too surprising Tomar ended up spending years of his life writing crappy essays (the book leaves little doubt that most of his work was crap—written on little sleep and even less research, and under the influence of plenty of alcohol and pot).

The most uncomfortable thing about Shadow Scholar is the way Tomar latches onto everything that is wrong with the education system and wallows in it. The cold and uncaring world makes him a victim, providing all of the justification he could want, while also appealing to a strongly developed sense of self-loathing. He uses the same kind of unimpressive rhetoric that drug dealers in movies always spout: “Hey, it’s just supply and demand, baby!”

Tomar knows the world could be a better place. He would be happy for students and teachers and administrators to genuinely care about things rather than being opportunistic and calculating. However, he doesn’t seem to have expected anything better from himself, and it’s sad that a funny and obviously talented young man would spend years in a soul-deadening, low-paying career. It’s like nobody told Tomar, and it never occurred to him, that you have to fight tooth and nail to make a place for yourself in the world where you can be happy.

Being bright does not save Tomar from missing some additional obvious facts. For example, he says “No wonder academia hates Wikipedia so much.” Look, I know a lot of professors and nobody hates Wikipedia. In fact, the only problem we have with it is that so many students fail to use it—and a host of other good sources of information—when we give them problems to solve. In the next sentence Tomar states that “Collective knowledge is a threat to those whose jobs are based on singular knowledge.” This approaches the largest amount of ignorance about academia that could possibly be crammed into a short sentence. We are not attempting to keep a monopoly on facts: that would be completely at odds with the culture of research and publication. It’s hard to imagine such a monopoly having existed at any time since the middle ages. The crucial resource in the ivory tower is not information but high-quality critical thought. Professors often have little time for this due to various demands of the job. Additionally, a great deal of our time and energy is spent trying to teach students how to think critically about various aspects of the world. The lack of time and energy and interest in this kind of thought among students is the most frustrating thing about being a professor. Tomar has, it seems, completely missed the point. This is rather appropriate given the subject of the book.

In the end I’m not sure what we can learn from Shadow Scholar. It spends too much space on repetetive autobiographical material. The existence of paper-writing mills is certainly no surprise to anyone who thought to search the web for them during the last 10 years. The implications of this industry—that is, customers exist and university instructors have no real way to combat this kind of cheating—are similarly obvious. The book is entertaining and clearly writing it was cathartic for Tomar; perhaps that’s enough.

Final exam question: How long did it take Tomar to write this book? I’m guessing a solid three weeks.

Certifying Compilers Using Random Testing

Before a tool such as a compiler is used as a critical component in an important software project, we’d like to know that the tool is suitable for its intended use. This is particularly important for embedded systems where the compiler is unlikely to be as thoroughly tested as a desktop compiler and where the project is very likely to be locked in to a specific compiler (and even a specific version, and a specific set of optimization flags).

Turns out, it’s hard to certify a compiler in any meaningful way. In practice, the most popular way to do it is to purchase a commercial test suite and then to advertise that it was used. For example, the Keil compiler validation and verification page mentions that the Plum Hall C validation suite has been used. This is all good but it’s sort of a minimal kind of certification in the sense that a highly broken compiler can pass this (or any) fixed set of tests. At some level, I think these validation suites are mainly used for two purposes. First, I’m sure they catch a lot of interesting edge-case bugs not caught by vanilla code. Second, using the test suite serves as an advertisement: “We care about details; we have enough money to buy the expensive tests, and we have enough engineers to fix the resulting issues.”

A random test case generator like Csmith can serve as a useful complement to fixed test suites. Whereas Plum Hall’s tests were written by hand and are primarily intended to check standards conformance, a properly designed random tester will deeply exercise internal code generation logic that may not be as strong as it should be.

If Csmith were used as a certification tool, a compiler vendor would get to advertise a fact like “Our compiler successfully translates 1,000,000 tests generated by Csmith 2.1.” This is a high standard; my guess is that no existing C compiler other than CompCert would meet it. Actually, CompCert would fail as well, but in a different sense: it does not handle the entire subset of C that Csmith generates by default.

For several years I’ve gently pushed the idea of Csmith as a compiler certification tool and have gotten some pushback. Random testing makes people uncomfortable just because it is random. For example, it may suddenly find a bug that has been latent for a long time. Traditional test suites, of course, will never do this—they only find latent bugs if you add new test cases (or if you modify the system in a way that makes the bug easier to trigger). People want to avoid unnecessary surprises near the end of a release cycle. This objection, though legitimate, is easy to fix: we can simply specify a starting PRNG seed and then the suite of 1,000,000 random programs suddenly becomes fixed. It retains its stress-testing power but will no longer surprise.

I’ve also received some non-technical pushback about Csmith that I don’t understand as well since people don’t tend to want to articulate it. My inference is that Csmith is a bit threatening since it represents an open-ended time sink for engineers. It’s hard to know ahead of time how deep the rabbit hole of compiler bugs goes. People would rather use their valuable engineers to make customers happy in a more direct fashion, for example supporting new targets or implementing new optimizations.

My hope has been that one high-profile compiler company will take the time to make (for example) 1,000,000 test cases work, and then advertise this fact. At that point, other companies would sense a certification gap and would then be motivated to also use Csmith in a high-profile way. So far it hasn’t happened.

The way that Csmith has found its way into compiler companies is not by top-down fiat (“Use this tool!”) but rather from the bottom up. Compiler engineers—probably the people most concerned with compiler quality—have run it probably mainly out of curiosity at first, have found compiler bugs, and then have kept using it. Unfortunately, due to the nature of this bottom-up process I generally get only indirect, informal indications that it is happening at all.

What’s Operating Systems Research About?

The other day at lunch I tried to explain to Suresh what operating systems research is all about, which got me thinking about this subject. As a quick glace at the OSDI 2012 program will confirm, the obvious answer “it’s about building operating systems” no longer applies, if it ever did. In fact, the trend away from working mainly on ring 0 code was noted more than 12 years ago in Rob Pike’s entertaining screed (which I regrettably missed—he visited Utah just a few months before I got here). Pike said “the situation is genuinely bad and requires action” but it’s clear that most of his observations are simply symptoms of a maturing field. For example, iOS and Android are systems that I would consider innovative, but they are respectively based on BSD and Linux kernels that Pike would consider completely boring. The fact is: these kernels work. Significant kernel innovation was not required to make modern tablets and smart phones viable. If we take a strict definition of operating systems research, a lot of the interesting work since 2000 has been in virtualization. In fact, it is curious that Pike’s slide deck does not mention virtualization since the modern wave of hypervisors (which originated in academia) was well underway by the time he gave his talk.

So there exists this moderately large community (the last SOSP I attended, in 2009, had 565 attendees) of bright people who understand systems, who aren’t afraid to get their hands dirty, and who want results instead of theorems. But also, the bar for creating a new OS has gotten higher and higher for reasons that Pike describes (there’s a lot of hardware to support and a lot of standards to implement) and also for an important reason that he fails to mention (operating systems already work pretty well). Does this community disband? No, they stick together but the kinds of problems being addressed become more diverse, as the OSDI program illustrates nicely.

It would be sad if a community existed only due to inertia, but that is not the case here. I would claim that the thing holding the OS community (as it exists today) together is a common approach to doing research. I’ll try to characterize it:

  1. The best argument is a working system. The more code, and the more results, the better. Something that is clearly a toy isn’t convincing. It is not necessary to build an abstract model, conduct a user study, prove soundness, prove correctness, or show any kind of asymptotic bound. In fact, if you want to do these things it may be better to do them elsewhere.
  2. The style of exposition is simple and direct; this follows from the previous point. I have seen cases where a paper from the OS community and a paper from the programming languages community are describing almost exactly the same thing (probably a static analyzer) but the former paper is super clear whereas the latter is incredibly difficult to figure out. To some extent I’m just revealing my own biases, but I also believe the direct approach to exposition is objectively better; I’ll try to return to this subject in a later post.
  3. The key to a strong research result is finding the right abstraction. A good abstraction is beautiful; it imposes little performance penalty; it leads to reliable systems; it leaks the right information and blocks things you didn’t want to know. It just feels right. The abstraction is probably for something low-level, but this doesn’t need to be the case. Finding good abstractions may sound easy but it’s super hard, often requiring lots of code to be thrown away multiple times.

And that, friends, is what OS research is about.

UPDATE from 9/15:

In a comment, Suresh says:

It seems to me that you need to be able to list core problems that you want to solve, or things you want to understand. OS as “the study of interfaces” seems overly broad, and characterizes really any system building effort, even if it’s in databases or in a public-key infrastructure.

At the level of an entire subfield I’m not sure you can construct a satisfying list of core problems to be solved. What would this be for software engineering? It would be something extremely vague like “enable predictable, low-cost creation of acceptable software.” Yuck. How about for programming languages? For scientific computing? For theoretical computer science? Obviously we can come up with something, but I think that at this level the approach matters more than the specific problems. The problems tend to come and go over a period of a few years. Some of them (e.g. efficient virtual memory, efficient hypervisors) get solved while others (concurrent programming, secure operating systems) end up being harder than we thought and slowly morph into more tractable versions.

Anyway, I’m bummed that you find this unsatisfying but it’s the best I can do right now. Maybe someone else can do better.

Bhaskar states that “theorems are a kind of result” and of course I agree. However, they are not a kind of result often seen in OS research, which is the only thing I was trying to talk about. He also says:

You first note that the simple and direct style of exposition for systems papers follows from the absence of a need to prove anything rigorous, and then indicate that this style is objectively better. This sounds contradictory to me. The more descriptive style preferred by systems researchers indeed follows from the fact that their core “rigorous/formal” component is code, which is not part of the paper itself; for communities where that component is, say, a proof, it must be presented within the paper itself. The paper consequently needs to be written with more precision, and may consequently be harder to read, particularly to those outside the field. As with any form of literature, the rhetorical style must match the purpose.

First, I like the bit about “their core rigorous/formal component is code” — that’s a great way of putting it.

Second, I shouldn’t have said that the writing style follows from the lack of proofs. Of course there exists wonderfully clear mathematical writing. However, the “OS writing style” does benefit from a relatively baggage-free research framework in which the world is its own best model.

Finally we come to the fun part: “as with any form of literature, the rhetorical style must match the purpose.” Of course this is true, but here we have been given this great gift where through some process of convergent evolution, researchers from different communities have ended up not only attacking the same problem, but also coming up with very similar solutions. If we look at some specific kinds of static analysis and bug finding, we can find papers from software engineering, from formal methods, from programming languages, and from operating systems that are doing essentially the same thing. Thus the purposes are the same. Even so, the rhetorical styles are very different. So we have form not following function, but rather following tradition. I’ve seen this happen a number of times. Reading these papers back to back is kind of like watching Rashomon.

A Utah Salmon Run

Although it is apparently well-known, I only recently learned that several streams leading to reservoirs in Utah have salmon runs. I hadn’t even realized that salmon could spend their entire lives in fresh water. The kokanee salmon were introduced early in the 20th century and have thrived.

Last Saturday, Sarah was out of town and the boys and I drove up to Causey Reservoir to the east of Ogden UT. The terrain is very rugged and a trail wanders high above cliffs for a few miles over to an inlet stream where the salmon are. We spent a long time hanging around watching the salmon struggle upstream; some were swimming very strongly, others were half dead already. These were big fish, at least 18″. Ben was there the same day and his fish pics came out better than mine.

[nggallery id=59]

ARM Math Quirks on Raspberry Pi

Embedded processors can be relied upon to be a little quirky. Lately I’ve been playing around with the Raspberry Pi’s BCM2835 processor, which is based on the ARM1176JZF-S core. The “J” stands for Jazelle, a module that permits this processor to execute Java bytecodes directly. As far as I know there’s no open source support for using this, and my guess is that a JIT compiler will perform much better anyway, except perhaps in very RAM-limited systems. The “Z” indicates that this core supports TrustZone, which provides a secure mode switch so that secrets can be kept even from privileged operating system code. The “F” means the core has a floating point unit. Finally, “S” indicates a synthesizable core: one for which ARM licenses the Verilog or VHDL.

One thing I always like to know is how an embedded processor and its compiler cooperate to get things done; looking at the compiler’s output for some simple math functions is a good way to get started. I have a file of operations of the form:

long x00 (long x) { return x* 0; }
long x01 (long x) { return x* 1; }
long x02 (long x) { return x* 2; }
long x03 (long x) { return x* 3; }
...

Of course, multiplying by zero, one, and two are boring, but there’s a cute way to multiply by three:

x03:
    add r0, r0, r0, asl #1
    bx lr 

The computation here is r0 + (r0<<1), but instead of separate shift and add instructions, both operations can be done in a single instruction: ARM supports pre-shifting one operand “for free” as part of ALU operations. I had always considered this to be sort of a bizarre fallout from ARM’s decision to use fixed-size instruction words (gotta use all 32 bits somehow…) so it’s pretty cool to see the compiler putting the shifter to good use. Presumably GCC has a peephole pass that runs over the RTL looking for pairs of operations that can be coalesced.

Multiplying by four is just (r0<<2) and multiplying by five is r0 + (r0<<2). Multiply-by-six is the first operation that requires more than one ALU instruction and multiply-by-11 is the first one that requires an additional register. GCC (version 4.6.3 at -Ofast) doesn’t emit an actual multiply instruction until multiply-by-22, which is also the first multiplication that would require three ALU instructions. Perhaps unsurprisingly, people have put some thought into efficiently implementing multiply-by-constant using shifts and adds; code at this page will solve instances of a pretty general version of this problem. Optimizing multiply seems to be one of those apparently simple problems that turns out to be deep when you try to do a really good job at it.

Turning multiply-by-constant into a sequence of simpler operations is satisfying, but is it productive? I did a bit of benchmarking and found that on the BCM2835, multiply-by-22 has almost exactly the same performance whether implemented using two instructions:

x22:
    mov r3, #22 
    mul r0, r3, r0
    bx lr

or using three instructions:

x22:
    add r3, r0, r0, asl #2 
    add r0, r0, r3, asl #1 
    mov r0, r0, asl #1 
    bx lr

When a mul instruction can be replaced by fewer than three add/subtract/shift instructions, the substitution is profitable. In other words, GCC is emitting exactly the right code for this platform. This is perhaps slightly surprising since benchmarking on any specific chip often reveals that the compiler is somewhat suboptimal since it was tuned for some older chip or else because it makes compromises due to targeting some wide variety of chips.

Next I looked at integer division. Implementing divide-by-constant using multiplication is a pretty well-known trick—see Chapter 10 of Hacker’s Delight—so I won’t go into it here. On the other hand, division by non-constant held a bit of a surprise for me: it turns out the ARM1176JZF-S has no integer divide instruction. I asked a friend who works at ARM about this and his guess (off-the-cuff and not speaking for the company, obviously) is that when the ARM11 core was finalized more than 10 years ago, chip real estate was significantly more expensive than it is now, making a divider prohibitive. Also, division probably isn’t a big bottleneck for most codes.

If we take source code like this, where ccnt_read() is from my previous post:

volatile int x, y, z;

unsigned time_div (void)
{
  unsigned start = ccnt_read ();
  z = x / y;
  unsigned stop = ccnt_read ();
  return stop - start;
}

The compiler produces:

time_div: 
    stmfd sp!, {r4, lr} 
    mrc p15, 0, r4, c15, c12, 1 
    ldr r3, .L2 
    ldr r0, [r3, #0] 
    ldr r3, .L2+4 
    ldr r1, [r3, #0] 
    bl __aeabi_idiv 
    ldr r3, .L2+8 
    str r0, [r3, #0] 
    mrc p15, 0, r0, c15, c12, 1 
    rsb r0, r4, r0 
    ldmfd sp!, {r4, pc}

On average, for random arguments, the cost of the __aeabi_idiv call is 26 cycles, but it can be as few as 15 and as many as 109 (these results aren’t the raw timer differences; they have been adjusted to account for overhead). An example of arguments that make it execute for 109 cycles is -190405196 / -7. The C library code for integer divide is similar to the code described here. Its slow path requires three cycles per bit for shift-and-subtract, and this is only possible due to ARM’s integrated shifter. Additionally, there’s some overhead for checking special cases. In contrast, a hardware divide unit should be able to process one or two bits per cycle.