Raspberry Rockets

One of the things I most enjoy about teaching embedded systems is that the students show up with a very diverse set of skills. Some are straight-up CS, meaning they can hack but probably are intimidated by a breadboard, logic analyzer, or UART. Others are EE, meaning that they can design a noise-free circuit or lay out a PCB, but probably can’t write an advanced data structure. Even the computer engineering students (at Utah this isn’t a department but rather a degree program that spans departments), who know both hardware and software, have plenty to learn.

The projects in my class are always done on some embedded development board; I’ve used four or five different ones. This semester we used the Raspberry Pi, which has a very different feel than all previous boards we’ve used because you can just shell into the thing and run emacs and gcc.

This fall I did something a bit differently than usual, which was to structure all of the programming projects around a single specification taken from Knight and Leveson’s 1986 paper about n-version programming. The specification is for a (fake) launch interceptor system that takes a collection of parameters and a collection of points from a radar, applies a bunch of rules, and outputs a launch / no-launch decision for a missile interceptor. It’s a good fit for class projects because it’s generally pretty straightforward code but has a few wrinkles that most people (including me) aren’t going to get right on the first try. The students divided into groups and implemented my slightly adapted version of the specification, then we did several rounds of testing and fixing. This all ended up eating up a lot of the semester and we didn’t end up having time to use the nice little fuzzer that I built. For another assignment I asked each group to compute the worst-case stack memory usage of their code.

For the final project, with just a few weeks left in the semester, we put the whole class together on a project where multiple boards cooperated to make launch decisions. First, a master node, running Linux, loaded launch interceptor conditions and sent them to a couple of subordinate nodes using SPI. Although good SPI master drivers are available for the RPi, we were unable to use the BCM2835’s SPI device in slave mode since the appropriate pins are not exposed to the external connectors (also the SoC is a BGA so we can’t “cheat” and get access to the pins a different way). Therefore, a group implemented a bit-banged SPI slave for the subordinate nodes. For fun, and to avoid problems with the tight timing requirements for bit-banged SPI, these nodes dropped Linux and ran on bare metal. When a subordinate node received a collection of launch interceptor data it computed the launch or no-launch decision and then provided the answer to the master node when queried. The master, then, would only make a launch decision when both subordinate nodes said that a launch should happen, at which point it sent a signal to a high-power relay that sent a generous amount of current to the rocket igniter.

Since the students did an awesome job getting all of this working, we spent the last day of classes launching rockets in a field at the edge of campus; pictures follow. The system shown in the pictures actually includes two different implementations of the subordinate node: one on RPi and the other on a Maple board, which uses an ARM Cortex-M3 chip. I like this modification since it’s more in the spirit of the original n-version paper anyway.

How Did Software Get So Reliable Without Proof?

Tony Hoare’s 1996 paper How Did Software Get So Reliable Without Proof? addresses the apparent paradox where software more or less works without having been either proved correct or tested on more than an infinitesimal fraction of its execution paths. Early in the paper we read:

Dire warnings have been issued of the dangers of safety-critical software controlling health equipment, aircraft, weapons systems and industrial processes, including nuclear power stations. The arguments were sufficiently persuasive to trigger a significant research effort devoted to the problem of program correctness. A proportion of this research was based on the ideal of certainty achieved by mathematical proof.

Fortunately, the problem of program correctness has turned out to be far less serious than predicted. A recent analysis by Mackenzie has shown that of several thousand deaths so far reliably attributed to dependence on computers, only ten or so can be explained by errors in the software: most of these were due to a couple of instances of incorrect dosage calculations in the treatment of cancer by radiation. Similarly predictions of collapse of software due to size have been falsified by continuous operation of real-time software systems now measured in tens of millions of lines of code, and subjected to thousands of updates per year.

So what happened? Hoare’s thesis—which is good reading, but largely uncontroversial—is that a combination of factors was responsible. We got better at managing software projects. When testing is done right, it is almost unreasonably effective at uncovering defects that matter. Via debugging, we eliminate the bugs that bite us often, leaving behind an ocean of defects that either don’t manifest often or don’t matter very much when they do. Software is over-engineered, for example by redundant checks for exceptional conditions. Finally, the benefits of type checking and structured programming became widely understood during the decades leading up to 1996.

The rest of this piece looks at what has changed in the 16 years since Hoare’s paper was published. Perhaps most importantly, computers continue to mostly work. I have no idea if software is better or worse now than it was in 1996, but most of us can use a computer all day without major hassles. As far as I know, the number of deaths reliably attributed to software is still not hugely larger than ten.

Software, especially in critical systems, has grown dramatically larger since 1996: by a factor of at least 100 in some cases. Since the factor of growth was probably similar during the 1980-1996 period, it seems a bit odd that Hoare didn’t include much discussion of modularity; this would be at the top of my list of reasons why large software can work. The total size of a well-modularized software system is almost irrelevant to most developers, whereas even a small system can become intractable to develop if it is sufficiently badly modularized. My understanding is that both the Windows NT and Linux kernels experienced serious growing pains at various times when coupling between subsystems made development and debugging more difficult than it needed to be. In both cases, a significant amount of effort had to be explicitly devoted to cleaning up interfaces between subsystems. When I attended the GCC Summit in 2010, similar discussions were underway about how to slowly ratchet up the degree of modularity. I’m guessing that every large project faces the same issues.

A major change since 1996 is that security has emerged as an Achilles’ Heel of software systems. We are impressively poor at creating networked systems (and even non-networked ones) that are even minimally resistant to intrusion. There does not appear to be any evidence that critical software is more secure than regular Internet-facing software, though it does enjoy some security through obscurity. We’re stuck with a deploy-and-patch model, and the market for exploits that goes along with it, and as far as I know, nobody has a better way out than full formal verification of security properties (some of which will be very difficult to formalize and verify).

Finally, formal correctness proofs have not been adopted (at all) as a mainstream way to increase the quality of software. It remains almost unbelievably difficult to produce proofs about real software systems even though methods for doing so have greatly improved since 1996. On the other hand, lightweight formal methods are widely used; static analysis can rapidly detect a wide variety of shallow errors that would otherwise result in difficult debugging sessions.

NOTE: The article is behind a paywall so high that I couldn’t even get to it from on campus, and had to use interlibrary loan instead. I tried to find a way to read the fair use regulations in such a way that I could post a copy but this seems like too much of a stretch, sorry…


I just got back from Tampere, Finland where I was one of the program chairs for EMSOFT, an embedded software conference. If I haven’t blogged about this much, it’s because I’m sort of a reluctant and not especially talented organizer of events. Happily, EMSOFT is just one third of the larger Embedded Systems Week, so logistics were handled at a higher level. Also happily, I had a co-chair Florence Maraninchi who did more than half of the work.

Visiting Finland was a pleasure; people were very friendly and the beer was good. On the other hand, Tampere wasn’t that easy to get to (~24 hours travel time from Salt Lake City) and my pessimistic expectations for October at 61°N were met: every day it was drizzly and not a lot above freezing. I saw the sun about once while in Finland, and by the time I left the days were noticeably shorter than when I arrived. I also hadn’t realized how far east Finland is: it’s only a four-hour drive from Helsinki to St. Petersburg. I found Finnish to be hilariously indecipherable, I hadn’t realized how much it differs from other European languages. I also hadn’t realized that everyone would just start speaking Finnish to me; apparently I fit in appearance-wise. Previously this has only happened to me in Germany and Holland. In general, however, everyone spoke perfectly acceptable English.

The conference was nice. Embedded software is a research area that I like a lot, though it can be a tricky one to appreciate due to a larger diversity of research approaches compared to other areas. For example, my view of OS research is that there’s a shared view for how to approach problems, even if there is a lot of variety in the actual problems being attacked. Embedded software isn’t like that. One of the topics we’ve started to see at embedded software conferences that I really like is looking at the robustness of software from the point of view of ensuring that bounded changes in inputs lead to bounded changes of outputs. This isn’t a notion that makes sense for all kinds of software, but it can be applied for example to feedback controllers and some kinds of signal processing codes. Robust control is an old idea but applying the technique to actual software is new. I’ve long believed that the lack of sensitivity analyses for software is a problem, so it’s great to see this actually being done.

ARM Math Quirks on Raspberry Pi

Embedded processors can be relied upon to be a little quirky. Lately I’ve been playing around with the Raspberry Pi’s BCM2835 processor, which is based on the ARM1176JZF-S core. The “J” stands for Jazelle, a module that permits this processor to execute Java bytecodes directly. As far as I know there’s no open source support for using this, and my guess is that a JIT compiler will perform much better anyway, except perhaps in very RAM-limited systems. The “Z” indicates that this core supports TrustZone, which provides a secure mode switch so that secrets can be kept even from privileged operating system code. The “F” means the core has a floating point unit. Finally, “S” indicates a synthesizable core: one for which ARM licenses the Verilog or VHDL.

One thing I always like to know is how an embedded processor and its compiler cooperate to get things done; looking at the compiler’s output for some simple math functions is a good way to get started. I have a file of operations of the form:

long x00 (long x) { return x* 0; }
long x01 (long x) { return x* 1; }
long x02 (long x) { return x* 2; }
long x03 (long x) { return x* 3; }

Of course, multiplying by zero, one, and two are boring, but there’s a cute way to multiply by three:

    add r0, r0, r0, asl #1
    bx lr 

The computation here is r0 + (r0<<1), but instead of separate shift and add instructions, both operations can be done in a single instruction: ARM supports pre-shifting one operand “for free” as part of ALU operations. I had always considered this to be sort of a bizarre fallout from ARM’s decision to use fixed-size instruction words (gotta use all 32 bits somehow…) so it’s pretty cool to see the compiler putting the shifter to good use. Presumably GCC has a peephole pass that runs over the RTL looking for pairs of operations that can be coalesced.

Multiplying by four is just (r0<<2) and multiplying by five is r0 + (r0<<2). Multiply-by-six is the first operation that requires more than one ALU instruction and multiply-by-11 is the first one that requires an additional register. GCC (version 4.6.3 at -Ofast) doesn’t emit an actual multiply instruction until multiply-by-22, which is also the first multiplication that would require three ALU instructions. Perhaps unsurprisingly, people have put some thought into efficiently implementing multiply-by-constant using shifts and adds; code at this page will solve instances of a pretty general version of this problem. Optimizing multiply seems to be one of those apparently simple problems that turns out to be deep when you try to do a really good job at it.

Turning multiply-by-constant into a sequence of simpler operations is satisfying, but is it productive? I did a bit of benchmarking and found that on the BCM2835, multiply-by-22 has almost exactly the same performance whether implemented using two instructions:

    mov r3, #22 
    mul r0, r3, r0
    bx lr

or using three instructions:

    add r3, r0, r0, asl #2 
    add r0, r0, r3, asl #1 
    mov r0, r0, asl #1 
    bx lr

When a mul instruction can be replaced by fewer than three add/subtract/shift instructions, the substitution is profitable. In other words, GCC is emitting exactly the right code for this platform. This is perhaps slightly surprising since benchmarking on any specific chip often reveals that the compiler is somewhat suboptimal since it was tuned for some older chip or else because it makes compromises due to targeting some wide variety of chips.

Next I looked at integer division. Implementing divide-by-constant using multiplication is a pretty well-known trick—see Chapter 10 of Hacker’s Delight—so I won’t go into it here. On the other hand, division by non-constant held a bit of a surprise for me: it turns out the ARM1176JZF-S has no integer divide instruction. I asked a friend who works at ARM about this and his guess (off-the-cuff and not speaking for the company, obviously) is that when the ARM11 core was finalized more than 10 years ago, chip real estate was significantly more expensive than it is now, making a divider prohibitive. Also, division probably isn’t a big bottleneck for most codes.

If we take source code like this, where ccnt_read() is from my previous post:

volatile int x, y, z;

unsigned time_div (void)
  unsigned start = ccnt_read ();
  z = x / y;
  unsigned stop = ccnt_read ();
  return stop - start;

The compiler produces:

    stmfd sp!, {r4, lr} 
    mrc p15, 0, r4, c15, c12, 1 
    ldr r3, .L2 
    ldr r0, [r3, #0] 
    ldr r3, .L2+4 
    ldr r1, [r3, #0] 
    bl __aeabi_idiv 
    ldr r3, .L2+8 
    str r0, [r3, #0] 
    mrc p15, 0, r0, c15, c12, 1 
    rsb r0, r4, r0 
    ldmfd sp!, {r4, pc}

On average, for random arguments, the cost of the __aeabi_idiv call is 26 cycles, but it can be as few as 15 and as many as 109 (these results aren’t the raw timer differences; they have been adjusted to account for overhead). An example of arguments that make it execute for 109 cycles is -190405196 / -7. The C library code for integer divide is similar to the code described here. Its slow path requires three cycles per bit for shift-and-subtract, and this is only possible due to ARM’s integrated shifter. Additionally, there’s some overhead for checking special cases. In contrast, a hardware divide unit should be able to process one or two bits per cycle.

Thoughts on Embedded Development Boards

An embedded development board is an off-the-shelf part that includes a microcontroller and some peripherals, mounted on a PCB. It tries to not get in the way of whatever you want to do, but instead makes it easy to start writing software and attaching hardware. Some boards even include a bit of prototyping space. The idea is that developers are then free to create a custom PCB in parallel with the main hardware and software development—if at all. Historically, many embedded development boards were not that cheap and worse, to use them you needed a proprietary compiler. GCC has changed this landscape dramatically.

In the embedded software class that I’ve taught for the last 10 years, we’ve used four different development boards as the basis for labs. Two of these boards had very small user communities. One of these was the Olimex LPC-H2129 board, which I chose because it gives easy access to every pin on the MCU and also because at the time (2004, I think) it was one of the cheapest ARM boards I could find that had all of the features I wanted. However, support is nonexistent and the grand total amount of documentation for this board is a 3-page PDF. I have not-fond memories of wasting a bunch of time getting the compiler, bootstrap loader, debugger, etc. working on both Windows and Linux. Another board that we used in class had much better documentation but again a tiny user community. I was also not a big fan of the CodeWarrior IDE for this board, particularly once it became based on Eclipse and started requiring increasingly arcane sequences of GUI actions to accomplish even simple tasks.

The biggest recent change in embedded development boards is due to the large user communities that have coalesced around platforms like Arduino (>>100,000 posts in the forum), Raspberry Pi (~100,000 posts) and mbed (~20,000 posts). These are incredibly valuable resources for novices since it means there are searchable answers to almost any common question and probably people who are willing to discuss more esoteric problems if asked nicely. I used to feel a bit sad that there didn’t seem to be a good modern equivalent of the great hobbyist platforms of my childhood (Timex Sinclair, Apple II, Trash-80, etc.). However, this situation has totally changed and things are awesome now. While there are plenty of things I want in a RPi-like board (ADCs, WiFi, a Cortex A9) at some level this seems like a nearly-mature platform: it already offers excellent prototyping capabilities while also serving as a usable UNIX machine.

High-Resolution Timing on the Raspberry Pi

Just to be clear, this post is about measuring the times at which events happen. Making things happen at specific times is a completely separate (and much harder) problem.

The clock_gettime() function (under Raspbian) gives results with only microsecond resolution and also requires almost a microsecond to execute. This isn’t very helpful when trying to measure short-duration events. What we want is access to the CPU’s cycle counter, which gives closer to nanosecond resolution and has low overhead. This is easy to accomplish:

static inline unsigned ccnt_read (void)
  unsigned cc;
  asm volatile ("mrc p15, 0, %0, c15, c12, 1" : "=r" (cc));
  return cc;

The problem is that if you call this code from user mode, your process will die due to executing an illegal instruction. By default, user mode does not get to read the cycle count register. To change this, we need a silly little LKM:

#include <linux/module.h>
#include <linux/kernel.h>

 * works for ARM1176JZ-F

int init_module(void)
  asm volatile ("mcr p15,  0, %0, c15,  c9, 0\n" : : "r" (1));
  printk (KERN_INFO "User-level access to CCR has been turned on.\n");
  return 0;

void cleanup_module(void)

After the insmod call, the cycle count register will be accessible. For all I know there’s a good reason why this access is disabled by default, so please think twice before using this LKM on your security-critical Raspberry Pi. (UPDATE: See pm215’s comment below, but keep in mind that if a local user wants to DOS your RPi board, there are many other ways to accomplish this.)

Annoyingly, the Raspbian folks have not yet released a kernel headers package for the current kernel (3.2.27+). Also, an LKM compiled against older headers will fail to load. However, this thread contains a link to some updated headers.

Here’s a tarball containing the code from this post and also a compiled LKM for the 3.2.27+ Raspbian kernel.

I’m writing this up since cycle counters are awesome and also stupid problems like the missing kernel headers made it take an embarrassing amount of time for me to get this going. There’s a lot of ARM timer code out there but all of it that I found is either intended for kernel mode or else fails to work on an ARM11 chip. I actually had to read the manual to make this work.

The solution here is the best one that doesn’t involve modifying the kernel. A better approach would be to expose this functionality through /proc.

Android Projects Retrospective

The Android Projects class I ran this semester has finished up, with students demoing their projects last week. It was a lot of fun because:

  • I find mobile stuff to be interesting
  • the students were excited about mobile
  • the students were good (and taught me a ton of stuff I didn’t know)
  • the course had 13 people enrolled (I seldom get to teach a class that small)

The structure of the class was simple: I handed each student a Samsung Galaxy 10.1 tab and told them that the goal was for each of them to do something cool by the end of the semester. I didn’t do any real lectures, but rather we used the class periods for discussions, code reviews, and a short whole-class software project (a sudoku app). Over the last several years I’ve become convinced that large lecture classes are a poor way for students to learn, and teaching a not-large, not-lecture class only reinforced that impression. If we believe all the stuff about online education, it could easily be the case that big lecture courses are going to die soon — and that wouldn’t be a bad thing (though I hope I get to keep my job during the transition period).

The course projects were:

  • A sheet music reader, using OpenCV — for the demo it played Yankee Doodle and Jingle Bells for us
  • A tool for creating animated GIFs or sprites
  • A “where’s my car” app (for the mall or airport) using Google maps
  • Finishing and polishing the sudoku app that the whole class did (we had to drop it before completion since I wanted to get going with individual projects)
  • An automated dice roller for Risk battles
  • A generic board game engine
  • A time card app
  • A change counting app, using OpenCV
  • A bluetooth-based joystick; the demo was using it to control Quake 2
  • A cross-platform infrastructure (across Android and iOS) for games written in C++

The total number of projects is less than the number of students since there were a few students who worked in pairs. I can’t easily count lines of code since there was a lot of reuse, but from browsing our SVN repo it’s clear that the students wrote a ton of code, on average.

Overall, my impression that desktops and laptops and dying was reinforced by this course. Not only can tablets do most of what most people want from a computer, but also the synergy between APIs like Google maps and the GPS, camera, and other sensors is obvious and there are a lot of possibilities. Given sufficient compute power, our tabs and phones will end up looking at and listening to everything, all the time. OpenCV’s Android port is pretty rough and it needs a lot of optimization before it will work well in real-time on tabs and phones, but the potential is big.

Android has a lot of neat aspects, but it’s an awful lot of work to create a decent app and it’s depressingly hard to make an app that works across very many different devices. This is expected, I guess — portable code is never easy. The most unexpected result of the class for me was to discover how much I dislike Java — the boilerplate to functional code ratio is huge. This hadn’t seemed like that much of a problem when I used to write standalone Java, and I had a great time hacking up extensions for the Avrora simulator, but somehow interacting with the Android framework (which seems pretty well done) invariably resulted in a lot of painful glue code. The students didn’t seem to mind it that much. It’s possible that I’m exaggerating the problem but I prefer to believe they’re suffering from the Stockholm syndrome.

Hello Android

Some semesters I teach courses that just need to be taught. On the other hand, this Fall I get to teach a class that should be purely fun — an Android projects course for upper-level undergrads. I already promised an “A” to anyone who (legitimately) makes at least $100 using an application developed for the class. I plan to nudge students away from web applications and single-node games, which seem boring. Rather, I hope they’ll interface with the awesome peripherals that make phones and tablets special. Today in class we looked through the source code for a pedometer app — the most interesting part being the step detection logic. It turned out to be pretty hacky, but it more or less works.

Here’s a gratuitous hardware shot — my pile of Galaxy 10.1 tabs, which the students got to take home today. Half of the tabs were gifts, half were purchased using university funds acquired by Thomas Schmid who was awesome enough to loan them to me (or rather, to my students) for the Fall. The most interesting part of the class will be when I get some Tegra 3 development boards — these should be seriously awesome.

Why Verify Software?

People like me who work on software verification (I’m using the term broadly to encompass static analysis, model checking, and traditional formal verification, among others) like to give talks where we show pictures of exploding rockets, stalled vehicles, inoperable robots, and crashed medical devices. We imply that our work is helping, or at least could help, prevent very serious software-related problems. It’s not clear that this sort of claim stands up to a close examination.

What would it take to show a causal link between verification efforts and software safety? A direct demonstration using a controlled experiment would be expensive. An indirect argument would need several parts. First, we’d have to show that flaws revealed by verification efforts are of the kind that could compromise safety. Second, we’d need to show that these flaws would not have been found prior to deployment by traditional V&V — those bugs are merely expensive, not harmful. Third, we’d need to argue that a given dollar spent on verification adds more safety than that same dollar spent on other ways of improving software. Finally, we would need to argue that a successful verification effort won’t have unintended consequences such as emboldening management to substantially accelerate the schedule.

Of course, none of this criticizes software verification research, which I work on and very much believe in. We simply need to be clear about its purpose, which is to reduce overall cost. A top-level goal for software verification that fails to mention cost (for example “minimize damage caused by bugs in software intensive systems”) is untenable because obviously the best way to minimize such damage is to radically simplify, or even eliminate, the software. Of course, in practice we do not wish to radically simplify or eliminate the software because it brings so many benefits.

A more reasonable high-level goal for software verification might be “increase, to the largest possible extent given the methods available, total system utility.” “Total system utility” has both positive and negative components, and verification is mainly about mitigating some of the negative components, or costs, including not just development and manufacturing costs, but also maintenance and accident-related costs. In the next couple of days I’ll post a more specific example where the cost-based analysis of verification is much more useful than the feel-good analysis promoted by exploding rocket pictures.

Do Small-RAM Devices Have a Future?

Products built using microcontroller units (MCUs) often need to be small, cheap, and low-power. Since off-chip RAM eats dollars, power, and board space, most MCUs execute entirely out of on-chip RAM and flash, and in many cases don’t have an external memory bus at all. This piece is about small-RAM microcontrollers, by which I roughly mean parts that use only on-chip RAM and that cannot run a general-purpose operating system.

Although many small-RAM microcontrollers are based on antiquated architectures like Z80, 8051, PIC, and HCS12, the landscape is changing rapidly. More capable, compiler-friendly parts such as those based on ARM’s Cortex M3 now cost less than $1 and these are replacing old-style MCUs in some new designs. It is clear that this trend will continue: future MCUs will be faster, more tool-friendly, and have more storage for a given power and/or dollar budget. Today’s questions are:

Where does this trend end up? Will we always be programming devices with KB of RAM or will they disappear in 15, 30, or 45 years?

I’m generally interested in the answer to these questions because I like to think about the future of computing. I’m also specifically interested because I’ve done a few research projects (e.g. this and this and this) where the goal is to make life easier for people writing code for small-RAM MCUs. I don’t want to continue doing this kind of work if these devices have no long-term future.

Yet another reason to be interested in the future of on-chip RAM size is that the amount of RAM on a chip is perhaps the most important factor in determining what sort of software will run. Some interesting inflection points in the RAM spectrum are:

  • too small to target with a C compiler (< 16 bytes)
  • too small to run multiple threads (< 128 bytes)
  • too small to run a garbage collected language (< 128 KB)
  • too small to run a stripped-down general-purpose OS such as μClinux (< 1 MB)
  • too small to run a limited configuration of a full-fledged OS (< 32 MB)

These numbers are rough. It’s interesting that they span six orders of magnitude — a much wider range of RAM sizes than is seen in desktops, laptops, and servers.

So, what’s going to happen to small-RAM chips? There seem to be several possibilities.

Scenario 1: The incremental costs of adding transistors (in terms of fabrication, effect on packaging, power, etc.) eventually become so low that small-RAM devices disappear. In this future, even the tiniest 8-pin package contains an MCU with many MB of RAM and is capable of supporting a real OS and applications written in PHP or Java or whatever. This future seems to correspond to Vinge’s A Deepness in the Sky, where the smallest computers, the Qeng Ho localizers, are “scarcely more powerful than a Dawn Age computer.”

Scenario 2: Small-RAM devices continue to exist but they become so deeply embedded and special-purpose that they play a role similar to that played by 4-bit MCUs today. In other words — neglecting a very limited number of specialized developers — they disappear from sight. This scenario ends up feeling very similar to the first.

Scenario 3: Small-RAM devices continue to exist into the indefinite future; they just keep getting smaller, cheaper, and lower-power until genuine physical limits are reached. Eventually the small-RAM processor is implemented using nanotechnology and it supports applications such as machines that roam around our bloodstreams, or even inside our cells, fixing things that go wrong there. As an aside, I’ve picked up a few books on nanotechnology to help understand this scenario. None has been very satisfying, and certainly none has gone into the kind of detail I want to see about the computational elements of nanotechnology. So far the best resource I’ve found is Chapter 10 of Freitas’s Nanomedicine Volume 1.

This third scenario is, I think, the most interesting case, not only because small-RAM devices are lovable, but also because any distant future in which they exist is pretty interesting. They will be very small and very numerous — bear in mind that we already manufacture more MCUs per year than there are people on Earth. What sensors and actuators will these devices be connected to? What will their peripherals and processing units look like? How will they communicate with each other and with the global network? How will we orchestrate their activities?