Android Projects Retrospective

The Android Projects class I ran this semester has finished up, with students demoing their projects last week. It was a lot of fun because:

  • I find mobile stuff to be interesting
  • the students were excited about mobile
  • the students were good (and taught me a ton of stuff I didn’t know)
  • the course had 13 people enrolled (I seldom get to teach a class that small)

The structure of the class was simple: I handed each student a Samsung Galaxy 10.1 tab and told them that the goal was for each of them to do something cool by the end of the semester. I didn’t do any real lectures, but rather we used the class periods for discussions, code reviews, and a short whole-class software project (a sudoku app). Over the last several years I’ve become convinced that large lecture classes are a poor way for students to learn, and teaching a not-large, not-lecture class only reinforced that impression. If we believe all the stuff about online education, it could easily be the case that big lecture courses are going to die soon — and that wouldn’t be a bad thing (though I hope I get to keep my job during the transition period).

The course projects were:

  • A sheet music reader, using OpenCV — for the demo it played Yankee Doodle and Jingle Bells for us
  • A tool for creating animated GIFs or sprites
  • A “where’s my car” app (for the mall or airport) using Google maps
  • Finishing and polishing the sudoku app that the whole class did (we had to drop it before completion since I wanted to get going with individual projects)
  • An automated dice roller for Risk battles
  • A generic board game engine
  • A time card app
  • A change counting app, using OpenCV
  • A bluetooth-based joystick; the demo was using it to control Quake 2
  • A cross-platform infrastructure (across Android and iOS) for games written in C++

The total number of projects is less than the number of students since there were a few students who worked in pairs. I can’t easily count lines of code since there was a lot of reuse, but from browsing our SVN repo it’s clear that the students wrote a ton of code, on average.

Overall, my impression that desktops and laptops and dying was reinforced by this course. Not only can tablets do most of what most people want from a computer, but also the synergy between APIs like Google maps and the GPS, camera, and other sensors is obvious and there are a lot of possibilities. Given sufficient compute power, our tabs and phones will end up looking at and listening to everything, all the time. OpenCV’s Android port is pretty rough and it needs a lot of optimization before it will work well in real-time on tabs and phones, but the potential is big.

Android has a lot of neat aspects, but it’s an awful lot of work to create a decent app and it’s depressingly hard to make an app that works across very many different devices. This is expected, I guess — portable code is never easy. The most unexpected result of the class for me was to discover how much I dislike Java — the boilerplate to functional code ratio is huge. This hadn’t seemed like that much of a problem when I used to write standalone Java, and I had a great time hacking up extensions for the Avrora simulator, but somehow interacting with the Android framework (which seems pretty well done) invariably resulted in a lot of painful glue code. The students didn’t seem to mind it that much. It’s possible that I’m exaggerating the problem but I prefer to believe they’re suffering from the Stockholm syndrome.

Pluggable Domain-Specific Testcase Reducers

An important part of effective random testing is providing developers with actionable testcases, and a big part of creating actionable testcases is testcase reduction. Delta debugging, as exemplified by the delta tool, is a good generic reduction technique. However, in many cases, as I discussed the other day here, a domain-specific reducer can do a much better job.

Today’s question is: How to build a domain-specific testcase reduction tool? Of course, one option is to build this kind of tool from scratch, but that’s not very fun. After building a few testcase reducers from scratch, and after watching my students build a few more, I discovered a nice API for isolating the transformation part of a reducer — the main part that is actually domain specific — from the rest of the tool. But first, let’s look at what has already been accomplished in terms of factoring a testcase reducer into reusable parts.

The original delta debugging algorithm, ddmin, mixes all of its concerns together. To reuse ddmin, you reimplement from scratch. The delta tool improves the situation by factoring the “interestingness test” out into an external script. This test decides whether a freshly produced delta variant should be accepted or discarded. This is really useful and I’ve written probably well over a dozen different interestingness tests. However, the transformation part of ddmin remains hard-coded in the delta tool. The only thing it knows how to do is remove one or more contiguous lines from a text file. This is nice but not nearly good enough for demanding reduction tasks.

This post is about factoring out the reduction transformations. This is necessary because you need one set of transformations to reduce C++ code (template expansion, namespace flattening, etc.) and different ones to reduce XML or Java or whatever. Furthermore, even if we consider testcase reduction for a single language, some transformations can be expressed as simple string manipulations — there’s no real need to factor these out of a reduction tool — whereas other transformations (inlining a C function or performing copy propagation) are far more involved. We want to maintain some separation between these compiler-like tools and the program reducer.

The transformation tool API is very simple; it provides a single point of entry:

reduce (filename, command, number)

It works like this:

  • filename refers to the current test case, which the transformation tool modifies in place
  • command specifies which transformation to perform; it doesn’t matter for tools that only perform one kind of transformation
  • number identifies which instance of the transformation to perform
  • the return value of a reduce call is either success, indicating that the transformation succeeded, or failure, indicating that number did not correspond to any transformation

The only trick here is properly enumerating transformation opportunities. For any given test case, it is required that there exists an n such that number in 0 .. n-1 will result in successful transformations and number ≥ n will fail. In my experience, assigning numbers to transformation opportunities is straightforward. For example, consider a transformation that tries to replace a constant in a test case with the value “1”. We could number constants by their order of appearance in the token stream, we could number them by their order of occurrence in a preorder traversal of the AST, or whatever — it doesn’t matter as long as the above property holds.

It is not required that the transformation produces a valid testcase, nor is it required that the transformation always makes the test case smaller. There are not, strictly speaking, any requirements at all on the transformation; it could randomly change the test case or even generate a whole new test case from scratch. On the other hand, in practice, it’s nice if the transformation:

  • is deterministic,
  • for any given input, eventually runs out of transformation opportunities when repeatedly applied, and
  • produces a valid, improved test case reasonably often.

The first property makes testcase reduction repeatable. The second property means that the reducer will eventually terminate when it reaches a fixpoint, as opposed to requiring a timeout or other termination criterion. To return to the example of a transformation that replaces constants with one, we would need to put a special case in the transformer code that prevents it from replacing one with one in order to satisfy this property. In practice it is often the case that a successful transformation decreases the number of possible transformations by one — but it’s easy to find examples where this does not hold (and it doesn’t need to). The third property means that the reducer is reasonably effective and fast.

Other parts of the testcase reducer — its search strategy, interestingness test, and fitness function — do not matter here. Let’s look at an example of plugging the “replace constant with one” transformer into a generic reducer. Let’s pretend we have a compiler that crashes with a math exception when it tries to constant-fold the division by zero. We’ll begin with this unreduced test case:

int foo (void) {
  int x = 33;
  int y = x / 0;
  return y + 66;
}

We invoke the reducer and it first issues the call “reduce (foo.c, REPLACE_WITH_ONE, 0)”. The call succeeds and gives:

int foo (void) {
  int x = 1;
  int y = x / 0;
  return y + 66;
}

This test case is still interesting (it still triggers the compiler crash) and it is smaller than the previous one, so the reducer accepts it as the current test case. The reducer heuristically assumes that the successful transformation eliminated an opportunity; therefore it again issues “reduce (foo.c, REPLACE_WITH_ONE, 0)”. Again the call succeeds and gives:

int foo (void) {
  int x = 1;
  int y = x / 1;
  return y + 66;
}

This test case is not interesting — it will not trigger a divide-by-zero bug in the compiler. This variant is discarded and the reducer now issues “reduce (foo.c, REPLACE_WITH_ONE, 1)”. Recall that constants with value 1 are skipped over when the transformer enumerates its possibilities. The transformation succeeds and the result is:

int foo (void) {
  int x = 1;
  int y = x / 0;
  return y + 1;
}

This is again interesting. However, the reducer’s next call “reduce (foo.c, REPLACE_WITH_ONE, 1)” fails because transformation opportunity 1 does not now exist. To be sure that it has reached a fixpoint, the reducer will need to again test “reduce (foo.c, REPLACE_WITH_ONE, 0)” which again succeeds but leaves an uninteresting testcase. At this point the REPLACE_WITH_ONE transformation cannot make progress. If the reducer has no other transformations to try, it terminates.

It would be fairly straightforward to reimplement the delta tool in this framework; I haven’t bothered yet because delta works perfectly well. Something interesting about the delta tool is that it doesn’t really reach a fixpoint: if you run it twice using the same parameters, the second run sometimes improves the testcase. Basically the authors decided to go for quicker termination at the expense of testcase quality. Another quirk is that it iterates backwards through its list of transformations whereas in the example above, we iterated forwards. Presumably the delta authors observed that backwards iteration is faster; I haven’t done the experiments to confirm this. To support backwards iteration, we’d have to add another call to the API:

get_count (filename, command)

This call returns the total number of possible instances of a transformation there are in a test case. To go backwards we’d start by issuing a reduce call at one less than this number and work down to zero.

A minor variant of the reduce interface that is used by my c_delta tool replaces the number argument with a byte count. The tool then doesn’t even bother searching for transformation opportunities in bytes preceding the specified starting point. Revising c_delta’s regex-based transformations to use numbers instead of byte counts would not be hard but I don’t have plans to do this soon.

The idea of decomposing testcase reduction into a collection of modular parts is one that I’ve explored before. This post takes a detailed look at one aspect of the overall design of testcase reducers: the reduction transformation. The API is simple and (I think) generically useful. I’m embarrassed to say that it took me a few years of dinking around with various testcase reducers before I discovered it. My guess is that it’s not sufficiently complicated or researchy to be publishable, so the writeup here will have to suffice.

Something that I hope comes out of this line of work is that after factoring reducers into reusable parts, we’ll be able to do a better job improving those parts. For example, comments on my previous post about reduction for C code indicated some transformations that I hadn’t thought of. Once we implement these, for example as Clang or Rose modules, then they can be trivially plugged into any testcase reducer supporting this API. Similarly, now that the overall reducer doesn’t need to understand interestingness tests or transformations, we’re free to start evaluating better search strategies, figuring out how to parallelize the reducer, etc.

Help Wanted: If you have a transformation tool for C or C++ that can be modified to fit the interface described in this post, and if it implements some transformation that is useful (or if you are willing to implement one), please leave a comment or drop me a line. I’m currently working on a “best of breed” reduction tool for C/C++ programs and am willing to try out any plausible transformation. I report a lot of compiler bugs and expect this tool to be super useful in practice.

How Integers Should Work (In Systems Programming Languages)

My last post outlined some of the possibilities for integer semantics in programming languages, and asked which option was correct. This post contains my answers. Just to be clear: I want practical solutions, but I’m not very interested by historical issues or backwards compatibility with any existing language, and particularly not with C and C++.

We’ll start with:

Premise 1: Operations on default integer types return the mathematically correct result or else trap.

This is the same premise that as-if infinitely ranged begins with, but we’ll perhaps end up with some different consequences. The things I like about this premise are:

  • It appears to be consistent with the principle of least astonishment.
  • Since it is somewhat harsh — ruling out many semantics such as 2’s complement wraparound, undefined overflow, saturation, etc. — it gives a manageably constrained design space.
  • I just don’t like two’s complement wraparound. It’s never what I want.

Of course there are specialized domains like DSP that want saturation and crypto codes that want wraparound; there’s nothing wrong with having special datatypes to support those semantics. However, we want the type system to make it difficult for those specialized integers to flow into allocation functions, array indices, etc. That sort of thing is the root of a lot of serious bugs in C/C++ applications.

Despite the constraints of premise 1, we have considerable design freedom in:

  • When to trap vs. giving a correct result?
  • Do traps occur early (as soon as an overflow happens) or late (when an overflowed value is used inappropriately)?

For high-level languages, and scripting languages in particular, traps should only occur due to resource exhaustion or when an operation has no sensible result, such as divide by zero. In other words, these language should provide arbitrary precision integers by default.

The tricky case, then, is systems languages — those used for implementing operating systems, virtual machines, embedded systems, etc. It is in these systems where overflows are potentially the most harmful, and also they cannot be finessed by punting to a bignum library. This leads to:

Premise 2: The default integer types in systems languages are fixed size.

In other words, overflow traps will occur. These will raise exceptions that terminate the program if uncaught. The rationale for fixed-size integers is that allocating more storage on demand has unacceptable consequences for time and memory predictability. Additionally, the possibility that allocation may happen at nearly arbitrary program points is extraordinarily inconvenient when implementing the lowest levels of a system — interrupt handlers, locks, schedulers, and similar.

Side note: The Go language provides arbitrary precision math at compile time. This is a good thing.

Overflows in intermediate results are not fun to deal with. For example, check out the first few comments on a PHP bug report I filed a while ago. The code under discussion is:

result = result * 10 + cursor - '0';

It occurs in an atoi() kind of function. This function — like many atoi functions (if you’ve written one in C/C++, please go run it under IOC) — executes an integer overflow while parsing 2147483647. The overflow is a bit subtle; the programmer seems to have wanted to write this code:

result * 10 + (cursor - '0')

However, the code actually evaluates as:

(result * 10 + cursor) - '0'

and this overflows.

How can we design a language where programmers don’t need to worry about transient overflows? It’s easy:

Premise 3: Expression evaluation always returns the mathematically correct result. Overflow can only occur when the result of an expression is stored into a fixed-size integer type.

This is more or less what Mattias Engdegård said in a comment to the previous post. Clearly, eliminating transient overflows is desirable, but how costly is it? I believe that in most cases, mathematical correctness for intermediate results is free. In a minority of cases math will need to be performed at a wider bitwidth, for example promoting 32-bit operands to 64 bits before performing operations. Are there pathological cases that require arbitrary precision? This depends on what operators are available inside the expression evaluation context. If the language supports a builtin factorial operator and we evaluate y = x!; there’s no problem — once the factorial result is larger than can fit into y, we can stop computing and trap. On the other hand, evaluating b = !!(x! % 1000000); is a problem — there’s no obvious shortcut in the general case (unless the optimizer is especially smart). We can sidestep this kind of problem by making factorial into a library function that returns a fixed-width integer.

A consequence of premise 3 that I like is that it reduces the number of program points where math exceptions may be thrown. This makes it easier to debug complex expressions and also gives the optimizer more freedom to rearrange code.

The next question is, should overflows trap early or late? Early trapping — where an exception is thrown as soon as an overflow occurs — is simple and predictable; it may be the right answer for a new language. But let’s briefly explore the design of a late-trapping overflow system for integer math. Late trapping makes integers somewhat analogous to pointers in C/C++/Java where there’s an explicit null value.

We require a new signed integer representation that has a NaN (not-a-number) value. NaN is contagious in the sense that any math operation which inputs a NaN produces NaN as its result. Integer NaNs can originate explicitly (e.g., x = iNaN;) and also implicitly due to uninitialized data or when an operation overflows or otherwise incurs a value loss (e.g., a narrowing type cast).

Integer NaN values propagate silently unless they reach certain program operations (array indexing and pointer arithmetic, mainly) or library functions (especially resource allocation). At that point an exception is thrown.

The new integer representation is identical to two’s complement except that the bit pattern formerly used to represent INT_MIN now represents NaN. This choice has the side effect of eliminating the inconvenient asymmetry where the two’s complement INT_MIN is not equal to -INT_MAX. Processor architectures will have to add a family of integer math instructions that properly propagate NaN values and also a collection of addressing modes that trap when an address register contains a NaN. These should be simple ALU hacks.

A few examples:

int x; // x gets iNaN 
char c = 1000; // c gets iNaN 
cout << iNaN; // prints "iNaN" 
p = &x + iNaN; // throws exception 
x = a[iNaN]; // throws exception

Good aspects of this design are that an explicit “uninitialized” value for integers would be useful (see here for example), execution would be efficient given hardware support, programs would tolerate certain kinds of benign overflows, and the stupid -INT_MAX thing has always bothered me. It’s also possible that a more relaxed overflow system leads to some nice programming idioms. For example, iterating over the full range is awkward in C. But now that we have a sentinel value we can write:

for (x = INT_MIN; x != iNaN; x++) { use (x); }

The main disadvantage of late trapping is that it would sometimes mask errors and make debugging more difficult in the same way that null pointers do. Type system support for “valid integers”, analogous to non-null pointers, would mitigate these problems. On the other hand, perhaps we’d have been better off with early trapping.

Finally, we could design an integer system using the iNaN integer representation and also early trapping. In this system, iNaN would exclusively be used to represent uninitialized or garbage values. Any operation on iNaN other than simple copying would result in a trap.

How Should Integers Work?

Integer computer math is kind of a bummer because it’s a lot more complicated than it seems like it should be. I’m specifically talking about the way that fixed-width integers wreck havoc on program logic when they overflow. There are other issues, but this seems like the big one. So: How should integers work?

C/C++’s undefined overflow semantics for signed integers is a real pain and I don’t think anyone would say this is how integers should work.

Lots of languages like Java specify two’s complement semantics for integer overflow. This is reasonably efficient (though it kills optimization opportunities in loop code) but it has the problem of not matching the behavior of mathematical integers. Which integer does Java give us when we increment 2,147,483,647? It gives the integer that is — out of all possible results — the farthest from the mathematical result. Just the other day I heard a story from the high performance computing people in my department where a parallel code worked correctly except at the largest scale, when an integer would overflow. I don’t know how they debugged this but it can’t have been fun.

Saturating integer semantics where overflows stop at the maximum or minimum representable value are useful in signal processing, but they are not used as the default integer semantics in any language that I know of. Unfortunately, too few platforms offer native support for saturating semantics; otherwise there’s a significant performance penalty. Also, a design choice needs to be made about whether saturation is sticky or not. In other words, does decrementing INT_MAX give INT_MAX, or does it give INT_MAX-1?

IEEE floating point supports sticky NaN values to represent nonsensical results. Adding a NaN integer value to represent overflows would seem like a reasonable compromise between efficiency and correctness. Unfortunately, the two’s complement representation has no spare bits to represent NaN, nor of course do hardware platforms support transparent propagation of integer NaNs.

Integer types can be infinitely ranged. Bignums have a reputation for poor performance, but they can be pretty fast if a smart compiler and runtime optimistically assume that operations don’t overflow, but silently trap into the bignum library when necessary. This is clearly the right choice for high-level languages, and clearly the wrong choice for systems languages like C and C++. The problem in C/C++ is that they are used to implement OS kernels, hypervisors, and bare-metal embedded systems that will become impossible to code if memory allocation can be triggered at nearly arbitrary program points. Small embedded systems don’t have have a dynamic allocator, and OS kernels have environments like interrupt handlers where allocation is dicey at best. To summarize, the most critical software systems are also the ones where we cannot afford friendly integer semantics.

Another option is explicitly ranged integers backed by formal methods tools. Here, the programmer basically has to prove that integers cannot overflow. This is unsuitable for many programming tasks because it is too difficult, but it’s probably the right answer for safety critical systems.

Finally, we can have hybrid integer semantics that combine various options. This add flexibility but makes the programming model more complex, particularly when different kinds of integers interact.

So: How should integers work?

This post shows some amusing bugs in simple integer codes.