Skip to content

That Forbes Article

That stupid Forbes article is making us look bad. By the way here’s what I did yesterday after teaching.

Nobody Expects the Spanish Inquisition, or INT_MIN to be Divided by -1

INT_MIN % -1 and INT_MIN / -1 in C/C++ are little gifts that keep on giving. Recently, Xi Wang has been using this construct to knock over languages implemented in C/C++. Then today Tavis Ormandy posted an excellent local DOS for a Windows 8 machine.

But the fun doesn’t stop there. For one thing, as I wrote a while ago, it’s not even clear that INT_MIN % -1 is undefined behavior in C99, C++98, or any earlier version. It is explicitly undefined in C11 and C++11.

Since undefined behavior is undefined, we get fun results from programs like this:

#include <limits.h>

int foo (int a, int b) {
  return a / b;
}

int main (void) {
  return foo (INT_MIN, -1);
}

Now we’ll compile at a few optimization levels using gcc 4.7.2 on either x86 or x86-64 and see what happens:

$ gcc -O0 intmin.c -o intmin
$ ./intmin 
Floating point exception (core dumped)
$ gcc -O1 intmin.c -o intmin
$ ./intmin 
Floating point exception (core dumped)
$ gcc -O2 intmin.c -o intmin
$ ./intmin 
$

Why does it crash at low optimization levels and not at high ones? It happens that -O2 is the first optimization level where the call to foo() is inlined, exposing its guts to constant propagation and folding, with the result that no idiv needs to be executed, and thus no exception gets thrown. A recent build of Clang has exactly the same behavior. If we replace / with %, the behavior is unchanged (under both compilers).

What should we take away from this? A few things:

  • If you can’t prove that division and modulus operations in your C/C++ code aren’t going to do these operations, explicit checks need to be added.
  • If you are testing a C/C++ program, try to get it to perform these operations, but keep in mind that all such testing can be unreliable due to the effect shown above.
  • Use Clang’s new -fsanitize=integer mode. It’s not in 3.2 but it’s in trunk and should be in 3.3 when it comes out in a few months. The integer sanitizer is based on the excellent work that Will Dietz has done in getting IOC into LLVM’s trunk.

Here’s what happens:

$ clang -fsanitize=integer intmin.c -o intmin
$ ./intmin 
intmin.c:5:12: runtime error: division of -2147483648 by -1 cannot be represented in type 'int'
Floating point exception (core dumped)

How is this different from the errors above? First, we get the nice diagnostic. Second, we’re guaranteed to get the error regardless of optimization level.

UPDATE: A comment on Xi’s post shows how to crash a Bash shell. For example, on my Ubuntu 12.10 machine:

$ ($((-2**63/-1)))
Floating point exception (core dumped)
$ bash -version
GNU bash, version 4.2.37(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ 

The behavior on Ubuntu 12.04 is the same. I would love to see more examples like this. If you know of one, please leave a comment.

C and C++ Aren’t Future Proof

A C or C++ program is expected to follow a collection of rules such as “don’t access out-of-bounds array elements.” There are a lot of these rules and they are listed (or are implicit) in the various language standards. A program that plays by all of these rules—called a conforming program—is one where we might, potentially, be able to make a guarantee about the program’s behavior, such as “This program doesn’t crash, regardless of input.” In contrast, when a C or C++ program breaks a rule, the standard doesn’t permit us to say anything about its behavior.

So where’s the problem? It comes in three parts:

  1. Some of the rules imposed by the C and C++ standards are routinely violated. For example, it is not uncommon to see creation (but not dereference) of invalid pointers, signed integer overflow, and violations of strict aliasing rules.
  2. Programmers expect a C/C++ implementation to behave in a certain way even when some of the rules are violated. For example, many people expect that creating an invalid pointer (but, again, not dereferencing it) is harmless. Program analyzers that warn about these problems are likely to lose users.
  3. C/C++ compilers have a standard-given right to exploit undefined behaviors in order to generate better code. They keep getting better and better at this. Thus, every year, some programs that used to work correctly become broken when compiled with the latest version of GCC or Clang or whatever.

This propensity for today’s working programs to be broken tomorrow is what I mean when I say these languages are not future proof. In principle a big C/C++ program that has been extensively tested would be future-proof if we never upgraded the compiler, but this is often not a viable option.

There is a long, sad history of programmers becoming seriously annoyed at the GCC developers over the last 10 years due to GCC’s increasingly sophisticated code generation exploiting the undefinedness of signed integer overflows. Similarly, any time a compiler starts to do a better job at interprocedural optimization (this has recently been happening with LLVM, I believe) a rash of programs that does stupid stuff like not returning values from non-void functions breaks horribly. Programmers used to think it was OK to read uninitialized storage and then compilers began destroying code that did this.

Let’s look at a specific example. In a recent series of posts (1, 2, 3), Pascal Cuoq has been using a formal verification tool called Frama-C to verify zlib. Why zlib? First, it’s not that big. Second, it’s ubiquitous. Third, it is believed to be high quality—if we ignore a build problem on Solaris, the last security issues were fixed in 2005. I would guess that it would be difficult to find a widely-used library that is clearly more solid than zlib.

So what kinds of problems has Pascal found in this solid piece of C code? Well, so far nothing absolutely awful, but it does appear to create invalid pointers and to compare these against valid pointers. Is this bad? That depends on your point of view. It is possible (and indeed likely) that no current compiler exploits this undefined behavior. On the other hand, it is not straightforward to perform formal verification of zlib unless we treat it as being written in a language that is quite similar to C, but that assigns a semantics to invalid pointers. Furthermore, a new compiler could show up at any time which does something horrible (like opening an exploitable vulnerability) any time zlib computes an invalid pointer.

Of course zlib isn’t the real problem; it’s small, and probably pretty close to being correct. The real problem is that there are billions of lines of C and C++ are out there. For every thousand lines of existing code there are probably a handful of undefined behavior time bombs waiting to go off. As we move forward, one of these things has to happen:

  1. We ditch the C and C++, and port our systems code to Objective Ruby or Haskell++ or whatever.
  2. Developers take undefined behavior more seriously, and proactively eliminate these bugs not only in new code, but in all of the legacy code.
  3. The C/C++ standards bodies and/or the compiler writers decide that correctness is more important than performance and start assigning semantics to certain classes of undefined operations.
  4. The undefined behavior time bombs keep going off, causing minor to medium-grade pain for decades to come.

I’m pretty sure I know which one of these will happen.

UPDATE from 1/24/2013: A comment on Hacker News pointed me to this excellent example of C not being future-proof. This is commonplace, folks. This particular undefined behavior, signed integer overflow, can be caught by our IOC tool which is now integrated into Clang as -fsanitize=integer, and will be in the 3.3 release.

The Space Child’s Mother Goose

I just noticed that this old favorite has been reprinted. A quick excerpt:

This is the Theory Jack built.

This is the Flaw
That lay in the Theory Jack built.

This is the Mummery
Hiding the Flaw
That lay in the Theory Jack built.

This is the Summary
Based on the Mummery
Hiding the Flaw
That lay in the Theory Jack built.

flaw_jack_builtThis is Constant K
That saved the Summary
Based on the Mummery
Hiding the Flaw
That lay in the Theory Jack built.

This is the Erudite Verbal Haze
Cloaking Constant K
That saved the Summary
Based on the Mummery
Hiding the Flaw
That lay in the Theory Jack built.

This is the Turn of a Plausible Phrase
That thickened the Erudite Verbal Haze
Cloaking Constant K
That saved the Summary
Based on the Mummery
Hiding the Flaw
That lay in the Theory Jack built.

This is Chaotic Confusion and Bluff
That hung on the Turn of a Plausible Phrase
That thickened the Erudite Verbal Haze
Cloaking Constant K
That saved the Summary
Based on the Mummery
Hiding the Flaw
That lay in the Theory Jack built.

This is the Cybernetics and Stuff
That covered Chaotic Confusion and Bluff
That hung on the Turn of a Plausible Phrase
Amd thickened the Erudite Verbal Haze
Cloaking Constant K
That saved the Summary
Based on the Mummery
Hiding the Flaw
That lay in the Theory Jack built.

This is the Button to Start the Machine
To make with the Cybernetics and Stuff
To cover Chaotic Confusion and Bluff
That hung on the Turn of a Plausible Phrase
And thickened the Erudite Verbal Haze
Cloaking Constant K
That saved the Summary
Based on the Mummery
Hiding the Flaw
That lay in the Theory Jack built.

This is the Space Child with Brow Serene
Who pushed the Button to Start the Machine
That made with the Cybernetics and Stuff
Without Confusion, exposing the Bluff
That hung on the Turn of a Plausible Phrase
And, shredding the Erudite Verbal Haze
Cloaking Constant K
Wrecked the Summary
Based on the Mummery
Hiding the Flaw
And Demolished the Theory Jack built.

Hiding Bugs from Branch Coverage

It’s hard to know when a piece of software has been sufficiently tested. Code coverage metrics are a partial answer. Basically, a coverage metric is a function that maps each execution to a set of coverage targets. For example, if we are performing function coverage, then the universe of targets is comprised of functions from the source code (or object code) and an execution covers every function from which it causes at least one statement or instruction to execute.

Can a bug hide from function coverage? This isn’t hard:

void foo (void) {
  if (!x) bug();
}

As long as foo() is called with x having a non-zero value, the function will be covered but the bug will not fire. We can fix this particular problem by tightening up the coverage metric, for example by insisting on statement coverage, where each statement in the program is a separate coverage target.

Let’s talk now about branch coverage, which insists that every conditional branch executes both ways. This is very similar to statement coverage except that it fixes a major hole in statement coverage where you can get 100% coverage without executing the false condition of any conditional that lacks an else branch.

Branch coverage is interesting because it’s about the strongest coverage metric that makes sense in practice. We can easily define metrics that have more coverage targets, such a path coverage, but achieving full coverage is extremely difficult and also it forces us to confront hard questions about the feasibility of paths. Here’s a page that provides a quick overview of the common and some no-so-common coverage metrics.

The rest of this piece is a partial list of ways that bugs can hide from branch coverage, and also some discussion of how we might go about finding these bugs.

Bug-Hiding Strategy #1: Concurrency

Hiding a bug using multithreading is really easy. For example:

void called_by_thread_1 () {
  p = NULL;
  ... do something quick ...
  p = q;
  ... do something slow ...
}

int called_by_thread_2 () {
  return *p;
}

Assume that p is an atomic variable and that q is never NULL. We can easily get full branch coverage without testing the case where called_by_thread_2() dereferences a null pointer, and in fact that odds are this is what will happen. Concurrent programs want specialized coverage metrics; here’s a paper that I like.

Bug-Hiding Strategy #2: Math Overflow

This code looks innocent:

int average (int x, int y)
{
  return (x+y)/2;
}

But unfortunately, on a platform with 32-bit ints, average (1000000000, 2000000000) returns -647483648, despite the fact that the actual result is perfectly representable in 32-bit two’s complement. The problem is that the intermediate value x+y overflows. This is undefined behavior in C/C++. In practice we expect wrapping behavior from this kind of overflow, but even wrapping produces the wrong answer here.

Common test coverage metrics, including very difficult ones like path coverage, do not help us find math overflow bugs. One possibility would be to add overflow checks to the source code and then aim for branch coverage, but this is horrible. One could also add overflow checks using the compiler and then aim for branch coverage of the object code; this probably only makes sense in conjunction with test-case generation tools such as Klee.

The real solution to math overflow bugs is systematic documentation and enforcement of constraints on integer values, supported by a tool such as Frama-C. For example, the average() function would be preceded by this annotation:

/*@ requires x + y <= INT_MAX; */

Adding x and y inside the annotation does not cause overflow problems because integer operations in the annotation language are the mathematical ones, not the C ones. The problem with writing these annotations is that it is very hard work.

Bug-Hiding Strategy #3: Loop Error

There is much potential for hiding bugs inside loops:

int factorial (int n)
{
  assert (n < 13);
  int f = 1;
  do {
    f *= n;
    n--;
  } while (n>0);
  return f;
}

This code looks pretty good and, in fact, it returns the correct result for 0<n<13. The assertion protects against integer overflow on 32-bit machines. The remaining interesting case is n=0, for which this function is wrong: it returns 0 instead of 1. Branch coverage doesn’t help find this kind of bug.

Test coverage for loops is tricky; some sources recommend forcing each loop to execute 0 times, 1 time, and more than once, when possible.

Bug-Hiding Strategy #4: Lookup Table

Certain kinds of codes get a good speedup by precomputing some sub-results and storing these in a lookup table. This is commonly seen for checksums, math library functions, and crypto codes. For example, this AES implementation has several lookup tables. They cannot be verified by casual inspection. If any entry in such a table is incorrectly computed or gets corrupted somehow, the resulting code will be wrong and branch coverage again does not help find the problem. The famous Pentium fdiv bug was precisely this kind of error.

We might insist that every entry in each lookup table is accessed during testing, or we might simply double check that each table contains that proper values.

Bug-Hiding Strategy #5: Array Index

Code like this is pretty easy to write:

int update (int index, value_t value)
{
  if (index > incorrect_table_size) return ERROR;
  table[index] = value;
  return OK;
}

Here we’ll need to test the OK and ERROR conditions, but if we’re comparing against the wrong table size, RAM corruption can result from code that received 100% branch coverage. The table size is unlikely to be wrong for a statically sized table, but I’ve seen real programs where a piece of code failed to properly notice when a dynamically sized table changed size.

Bug-Hiding Strategy #6: Path-Sensitive Conditionals

This code is highly contrived, but it illustrates the problem:

int silly (int a) {
  int *p = &x;
  if (a > 10) p = NULL;
  if (a < 20) return *p;
  return 0;
}

Assume x is a global int. If we test this code with parameter values of 5 and 25, branch coverage is satisfied, and yet we will not trigger the undefined behavior that happens when a==15. Besides going for path coverage, I don’t know of a good way to find these bugs.

Bug-Hiding Strategy #7: Heap Shape

Tons of problems can hide in heap data structures. For example, assume that we have a red-black tree implementation where the delete() operation leaves the tree in a broken state where subsequent insert() calls will fail. We could easily end up with 100% branch coverage of insert(), delete(), and the other operations without even actually testing the case where a node is deleted and then inserted.

Conclusions

100% branch coverage is often attainable at the unit level, but almost never is at the system level. Regardless, it is insufficient for finding any number of kinds of bugs (the list here is just off the top of my head; if you have any favorites that I missed, please let me know).

There’s no general-purpose way to find these bugs, but rather a variety of partial solutions. First, we can pay close attention to idioms such as lookup tables that can subvert branch coverage. Second, we can add extra paths to the code in an attempt to make implicit predicates explicit, such as integer overflow checks. Third, we can play with the coverage metric, for example by augmenting branch coverage with additional loop coverage targets. Finally, we can add annotations to our code to express necessary preconditions, postconditions, and loop invariants, and then use a tool to check these.

Life After Tenure

I wanted to dequeue one more post before the semester gets going for real. This is a collection of random observations about the contrast between being an untenured and tenured professor.

Tenure is an up-or-out process: either you get it or you lose your job. This is stressful for everyone. Getting tenure is excellent in that this source of stress is removed, but it is also a little bit confusing since we are suddenly forced to ask what exactly it is that we are doing with our lives now that—at long last—we aren’t living up to some random collection of external expectations. You don’t have to look too hard to find examples of people who have voluntarily left academia after getting tenure; my guess is that this effect is sometimes a factor.

Basically everyone who gets tenure, including me, finds him/herself continuing to get busier every year. Not only do the demands on our time increase, but we can no longer fall back on the convenient selfish excuse “sorry—can’t possibly do that time consuming task while I’m on the tenure track.” The constantly overloaded TODO list gets old, and it also makes it hard to get actual research done. It’s no coincidence that a lot of tenured professors rely on students to do all of the technical work associated with their research programs. Untenured professors not only have more time to do research but also may be reluctant to risk their careers by putting students on too many critical paths.

I used to knock off work between 11 and midnight, whereas I now generally quit between 10 and 11. Additionally, during many of my pre-tenure years I got very little exercise, which resulted in high blood pressure and various other problems; now I spend some time staying in shape. On the other hand, I don’t take as many days off now, probably due to more stuff going on in general and the kids being on a school schedule. Even so, I probably work fewer hours per day now than I used to.

I’m trying to write fewer papers now, and to make each paper a really solid one. This blog is part of that effort: it siphons off a good deal of writing energy. At some point leading up to tenure I realized that writing a mediocre paper makes no sense; it is almost as much work as writing a good one and it dilutes whatever reputation for myself that I may be managing to build up. A few of my more recent papers, most notably the Csmith one, are the result of a large amount of effort that would simply not have made sense to allocate to a single paper before tenure. I could have turned Csmith into a series of papers, but why? I can’t see that as being useful to anyone. There’s an avalanche of crappy papers and I would love to not be part of it. It is possible that I’m doing my PhD students a disservice by publishing less; I try to deal with this by being up-front with them about the academic track: if they want to get on it, they need to learn to write fairly rapidly and extremely well. I’m happy to work with them on that project, but I’m not going to push. The impetus needs to come from them.

When I got tenure, one of my plans was to avoid letting my teaching be compromised by research pressures. I’m afraid that I have not accomplished this goal. I do a pretty good job teaching, but no better than I used to. I do, however, feel more free to take risks in the classroom by trying teaching techniques that aren’t totally tried and true and by throwing out old assignments more often. This causes me to take a hit on teaching evaluations sometimes, which is OK.

Finally, I’d like to say that I now take more time to think and read, but I’m not sure that this is the case. Probably the main thing that I do differently along these lines is write blog posts, which is certainly useful as a way to motivate thinking. I read relatively few books and papers related to my research, except on occasional bursts where I need to learn some new research thread or subfield.

When Software Ages Badly

In some respects, software ages gracefully: it generally starts out working poorly but gets better over time as bugs are fixed. Unlike hardware, there’s no physical wearing out of parts. This post is about a few ways in which software doesn’t get better with age.

In The Mythical Man Month Brooks observes that any sufficiently complex software system will tend to have an “irreducible number of errors”—a reference to the fact that in a crufty old software base, any bug fix is likely to break something else. It is possible to fight this phenomenon through aggressive modularization and refactoring but these are expensive.

The second way for old software to become less reliable is when the meaning of its code changes. Obviously this can happen when the software is ported to a new architecture where, for example, pointers or integers have a different size. Also, a library or operating system update—even an obvious fix for a simple bug—can break an application that relied on the previous semantics. A compiler upgrade can break working code in any number of insidious ways. For example, an improved optimizer can make code run faster, exposing previously latent race conditions. A C/C++ compiler’s exploitation of undefined behavior can be ratcheted up a notch, breaking code that previously worked. This has been continuously happening for at least a decade and it’s not going to stop in the foreseeable future.

The inputs that are provided to a software system can change in harmful ways. The best example of this that I can think of is when a new class of attack becomes well understood, such as stack buffer overflows, heap overflows, and integer overflows. When a nasty new fuzzing tool appears, a previously solid-seeming piece of code can be shown to be not correct at all. The input domain can also change shape when a piece of software is reused in a new environment; Ariane 5 flight 501 is an excellent example.

Finally, in the long run, good old software rot can happen. Programming idioms, programming languages, software architectures, and other aspects of code are always evolving and eventually progress gets the better of almost any piece of software.

We’ve been writing software for hardly more than half a century, and I suspect that the vast majority of lines of code in existence were written since 2000. What will the world look like when we’ve had software for as long as we’ve had the printing press? What about when we’ve had computer languages for as long as we’ve had spoken languages? How are we going to perform refactoring on a code base of a hundred billion lines? Vernor Vinge envisions professional programmer-archaeologists whose job it is to deal with these systems; I’m not sure we’re that far off from needing these people now.

Peeking into the Trash

Just to see what was there, I rendered a thumbnail for every URL in my email trash folder. Below are the first 250 (full set is here).

Operant Conditioning by Software Bugs

conditioning Have you ever used a new program or system and found it to be obnoxiously buggy, but then after a while you didn’t notice the bugs anymore? If so, then congratulations: you have been trained by the computer to avoid some of its problems. For example, I used to have a laptop that would lock up to the point where the battery needed to be removed when I scrolled down a web page for too long (I’m guessing the video driver’s logic for handling a full command queue was defective). Messing with the driver version did not solve the problem and I soon learned to take little breaks when scrolling down a long web page. To this day I occasionally feel a twinge of guilt or fear when rapidly scrolling a web page.

The extent to which we technical people have become conditioned by computers became apparent to me when one of my kids, probably three years old at the time, sat down at a Windows machine and within minutes rendered the GUI unresponsive. Even after watching which keys he pressed, I was unable to reproduce this behavior, at least partially because decades of training in how to use a computer have made it very hard for me to use one in such an inappropriate fashion. By now, this child (at 8 years old) has been brought into the fold: like millions of other people he can use a Windows machine for hours at a time without killing it.

Operant conditioning describes the way that humans (and of course other organisms) adapt their behavior in response to the consequences resulting from that behavior. I drink beer at least partially because this has made me feel good, and I avoiding drinking 16 beers at least partially because that has made me feel bad. Conditioning is a powerful guiding force on our actions and it can happen without our being aware of it. One of my favorite stories is where a psychology class trained the professor to lecture from the corner by paying attention when he stood in one part of the room and looking elsewhere when he did not. (This may or may not be only an urban legend, but seems plausible even so.) Surely our lives are filled with little examples of this kind of unconscious guidance from consequences.

How have software bugs trained us? The core lesson that most of us have learned is to stay in the well-tested regime and stay out of corner cases. Specifically, we will:

  • periodically restart operating systems and applications to avoid software aging effects,
  • avoid interrupting the computer when it is working (especially when it is installing or updating programs) since early-exit code is pretty much always wrong,
  • do things more slowly when the computer appears overloaded—in contrast, computer novices often make overload worse by clicking on things more and more times,
  • avoid too much multitasking,
  • avoid esoteric configuration options,
  • avoid relying on implicit operations, such as the fact that MS Word is supposed to ask us if we want to save a document on quit if unsaved changes exist.

I have a hunch that one of the reasons people mistrust Windows is that these tactics are more necessary there. For example, I never let my wife’s Windows 7 machine go for more than about two weeks without restarting it, whereas I reboot my Linux box only every few months. One time I had a job doing software development on Windows 3.1 and my computer generally had to be rebooted at least twice a day if it was to continue working. Of the half-dozen Windows laptops that I’ve owned, none of them could reliably suspend/resume ten times without being rebooted. I didn’t start this post intending to pick on Microsoft, but their systems have been involved with all of my most brutal conditioning sessions.

Boris Beizer, in his entertaining but long-forgotten book The Frozen Keyboard, tells this story:

My wife’s had no computer training. She had a big writing chore to do and a word-processor was the tool of choice. The package was good, but like most, it had bugs. We used the same hardware and software (she for her notes and I for my books) over a period of several months. The program would occasionally crash for her, but not for me. I couldn’t understand it. My typing is faster than hers. I’m more abusive of the equipment than she. And I used the equipment for about ten hours for each of hers. By any measure, I should have had the problems far more often. Yet, something she did triggered bugs which I couldn’t trigger by trying. How do we explain this mystery? What do we learn from it?

The answer came only after I spent hours watching her use of the system and comparing it to mine. She didn’t know which operations were difficult for the software and consequently her pattern of usage and keystrokes did not avoid potentially troublesome areas. I did understand and I unconsciously avoided the trouble spots. I wasn’t testing that software, so I had no stake in making it fail—I just wanted to get my work done with the least trouble. Programmers are notoriously poor at finding their own bugs—especially subtle bugs—partially because of this immunity. Finding bugs in your own work is a form of self-immolation. We can extend this concept to explain why it is that some thoroughly tested software gets into the field and only then displays a host of bugs never before seen: the programmers achieve immunity to the bugs by subconsciously avoiding the trouble spots while testing.

Beizer’s observations lead me to the first of three reasons why I wrote this piece, which is that I think it’s useful for people who are interested in software testing to know that you can generate interesting test cases by inverting the actions we have been conditioned to take. For example, we can run the OS or application for a very long time, we can interrupt the computer while it is installing or updating something, and we can attempt to overload the computer when its response time is suffering. It is perhaps instructive that Beizer is an expert on software testing, despite also being a successful anti-tester, as described in his anecdote.

skinner The second reason I wrote this piece is that I think operant conditioning provides a partial explanation for the apparent paradox where many people believe that most software works pretty well most of the time, while others believe that software is basically crap. People in the latter camp, I believe, are somehow able to resist or discard their conditioning in order to use software in a more unbiased way. Or maybe they’re just slow learners. Either way, those people would make amazing members of a software testing team.

Finally, I think that operant conditioning by software bugs is perhaps worthy of some actual research, as opposed to my idle observations here. An HCI researcher could examine these effects by seeding a program with bugs and observing the resulting usage patterns. Another nice experiment would be to provide random negative reinforcement by injecting failures at different rates for different users and observing the resulting behaviors. Anyone who has been in a tech support role has seen the bizarre cargo cult rituals that result from unpredictable failures.

In summary, computers are Skinner Boxes and we’re the lab rats—sometimes we get a little food, other times we get a shock.

Acknowledgment: This piece benefited from discussions with my mother, who previously worked as a software tester.

Is the Browser the New OS?

Yes, this is an old question. I still think it’s interesting. Disclaimer: I haven’t tried out a Chromebook yet.

First, let’s look at the situation as of late 2012. The applications I use generally fall into three categories:

  1. Web-based.
  2. Native, but easily available on Windows, Mac, and Linux. These include a file browser, a shell, Emacs, LaTeX.
  3. Native and less portable: Photoshop, development tools such as compilers and debuggers, high-performance games.

A quick look at the Chrome store indicates that most or all of the applications in category #2 are already available in browser versions (though I currently use the native versions out of inertia). Category #3 is trickier; it’s not clear that it really makes sense to port, for example, Photoshop into the browser. On the other hand, category #3 is basically power tools and many people (including me, much of the time) can get by using weaker web-based versions. Even embedded software development tools, which might at first seem to be the antithesis of web applications, have web-based versions. In summary, to a first approximation, “the browser is the new OS” has mostly happened already, though we do see an interesting reverse trend in the mobile space (though quite a few of those native apps are thin wrappers around a browser).

Second, how did we get to this point? The first application I ever saw that used a browser as its primary interface was SATAN in 1995. I remember thinking this was an elegant and striking design decision; it permitted the implementors to focus on their application logic instead of wasting time on what would likely have ended up being the kind of crappy X windows GUI that was standard at the time. Not long after 1995 we saw the rise of Java-enabled browsers making everyone (for a short time, at least) get interested in the potential for platform-independent applications delivered over the web. But this potential remained unrealized. Around 2008 it became clear that the following three efforts, taken together, constituted a serious attempt to create a new platform. First, Google Chrome, with a fast JavaScript engine and significantly increased robustness (and perhaps also security) due to the process-per-tab model. Second, Google Gears, which facilitated disconnected operation for web applications. And third, Google Docs, which implements at least 80% of the useful stuff in MS Office. Of course Gears is gone but similar functionality is available elsewhere. More recently, Chrome OS and the Chromebook make it clear what Google wants to see. I’m painting a sort of Google-centric picture here and in fact I use their stuff a lot. However, most of the time I could get by using Firefox, Bing, and other alternative technologies.

Finally, what is the prognosis for browser-is-the-OS, looking forward? What will it take for people to really not care about which OS they’re running? First, we want to minimize the number of applications in category #3, but realistically I think most casual users don’t have that many apps in this category and us power users are willing to compromise. For example, the special-purpose tools I use for research are probably always going to run outside the browser and that is fine—they’ll run on workstation-class machines at the office and in the cloud. Photoshop and high-quality video processing is not likely to be moving into the cloud real soon, but on the other hand again these are special-purpose, workstation-grade applications. Most people already do their photo editing online.

The second thing that I require is that any web application that looks remotely like an editor (whether for programs, documents, presentations, web pages, or whatever) has to support rock-solid disconnected operation and have a sensible conflict resolution mechanism. Google Docs’ disconnected operation seems pretty good since last summer, but I worry that lots of applications are going to need special-purpose logic to handle this nicely, and it’s going to be a long slog since many developers won’t consider disconnected to be a priority.

Third, we need near-native performance in the browser. Plenty of JavaScript work has been done and it’s getting pretty fast. For more compute-intensive workloads Google Native Client seems like a good option.

In summary, we seem to be headed towards a segregated world where most people don’t need a PC, but rather an appliance that runs a browser. On the other hand, people who use computers more seriously will keep using workstation-style machines that run native apps when considerations such as performance, dealing with large amounts of data, security, reliability, and local control are paramount. I’ll go ahead and predict that in 2022, the number of PCs sold will be 25% of what they were in 2012. By PC I mean something like “non-Android, non-iOS machine primarily intended to be used by a single person to run native applications”—a category that includes all laptops except Chromebooks.