Some computer things change very slowly; for example, my newish desktop at home has a PS/2 port. Other things change rapidly: my 2010 iPad is kind of a stone-age relic now. This kind of differential progress creates some funny inversions. A couple of historical examples:
- Apparently at one point in the 80s or 90s (this isn’t a firsthand story– I’d appreciate recollections or citations) the processor available in an Apple printer was so fast that people would offload numerical computations to their printers.
- I spent the summer of 1997 working for Myricom. Using the then-current Pentium Pro machines, you could move data between two computers faster than you could do a local memcpy(). I’m pretty sure there was something wrong with the chipset for these processors, causing especially poor memcpy() performance, but I’ve lost the details.
What are the modern examples? A few come to mind:
- Non-volatile storage used to be really slow because disk reads involve waiting for moving parts. Huge increases in the speed of non-volatile storage have lead Intel to provide instruction set support for non-volatile memory.
- I realize the example may not be 100% serious, but placing a filesystem in video RAM represents some kind of serious inversion.
- A device that looks like a flash disk but includes a Linux machine that would have seemed pretty fast a few years ago.
- Running C++ in the browser by compiling to asm.js.
Anyhow, I enjoy computing inversions since they challenge our assumptions.
24 responses to “Inversions in Computing”
I’m not certain of the technical accuracy of this (and unfortunately can’t find an appropriate reference at the moment), but I’ve heard it said that for a brief period (mid-90s I’d guess?) the fastest x86 processor in the world was an emulator running on an Alpha.
http://en.wikipedia.org/wiki/LaserWriter#Hardware
The LaserWriter had a 12MHz 68000, which was faster than any Mac at the time. I’ve never heard of people using it for computations — unless you mean sending a PostScript program rather than prerendered data.
Regarding your printer tale, I used to work at Oxford uni with a professor who had written a Fast Fourier Transform in postscript. It was indeed faster to send the program to the printer than to run it on one of the NeXT workstations (it was an HP printer; NeXT printers used the postscript interpreter in the workstation).
I’m not sure it fits your pattern exactly, but the specialize an OS down to a single application and run it in a VM thing seems funny to me.
Zev, that’s great– if you run across a reference I’d love to read it.
Graham and pdw, thanks for the details!
Sort of along these lines – Command queueing in hard drives was invented because hard drives were much slower but now it is useful because flash storage is much faster.
I’ve considered a few times what it would take to implement the US tax code in PostScript:
What do you use to do your taxes?
My printer.
Here’s a reference to the Alpha claim: https://www.usenix.org/legacy/publications/library/proceedings/usenix-nt97/full_papers/chernoff/chernoff.pdf.
My recollection (which seems to be supported by the paper) is that, with binary translation, the Alpha was faster than a P6 in specfp, but worse in specint. It completely dominated both benchmarks when running native code, of course.
I think the Apple in question was an Apple II. The 6502 processor was quite weak, and there existed solutions such as external processor cards that would give the machine more power. One option was a card with the 68000 processor. I don’t know about printers, but it does not seem unplausible that there were some printer available for the Apple II that used the 68000, sometime around the mid 80s.
Network latency is another inversion.
It’s intuitive to expect local operations to be faster than remote operations, but network latency has improved to the point that, with a bog standard network, you can do an RPC that’s faster than a spinning metal disk seek. With really fast networking (e.g., the stuff described here in http://blog.cloudflare.com/a-tour-inside-cloudflares-latest-generation-servers/), you might be able to do an RPC more quickly than an SSD seek.
Of course inversions will always flip back.
Not too long ago we started offloading CPU to general GPU. now people are already starting to offload GPU ops back to cpu.
Thin/thick clients: Mainframes => PC => cloud
A/sync: interrupts => threads => events => co-routines. All on the same stack!
John Carmack is revolted about one, too: https://twitter.com/id_aa_carmack/status/193480622533120001
@K.Haddock: The 6502 processor weak? Pfah. The C64 had a 6502 (well, a 6510, but let’s not quibble) and this RISC-like titan was so powerful that its disk drive was outfitted with, um, a second 6502 to handle all the complicated stuff.
That’s right, a C64 with disk drive was a dual core home computer before it was cool. Although I’ve never heard of people offloading non-disk calculations to the drive — this wouldn’t have been very practical, as the communication overhead would have probably killed most advantages.
@Jeroen Mostert: I recall reading that at least one chess program offloaded parts of the chess program to the disk drive. According to a thread at http://www.lemon64.com/forum/viewtopic.php?t=27911&start=0 some demos seemed to have offloaded 3d-calculations to the disk drive as well.
John, w.r.t.:
> Using the then-current Pentium Pro machines, you could move data between two computers faster than you could do a local memcpy(). I’m pretty sure there was something wrong with the chipset for these processors, causing especially poor memcpy() performance, but I’ve lost the details.
Consider that when you’re doing a memcpy, you have both read and write traffic in RAM, whereas when you’re sending it over the network you have only read traffic on one machine, and only write traffic on another. In other words, when you’re performing a non-local copy, you have twice as much memory bandwidth! And if the memory is slow while the network is extremely efficient and fast, that would explain it.
The “Wheel of Reincarnation” has been around for a long time in graphics (co)processors [which tend to evolve in the direction of becoming smarter & faster, until they’re full-fledged computers, ready to have their own initially-dumb-but-later-smarter coprocesors]:
@Article{Myer:1968:DDP,
author = “T. H. Myer and I. E. Sutherland”,
title = “On the Design of Display Processors”,
journal = “Communications of the ACM”,
year = “1968”,
volume = “11”,
number = “6”,
month = jun,
pages = “410–414”,
}
Alexander, definitely the case. I’ve just never seen this actually happen before (or since).
Thanks Jonathan, that’s definitely a classic.
Andreas, thanks for the link! Great to see people actually did this stuff.
That is a great example Stefan. I wonder what platform he’s talking about? It seems surprising.
Dan, great links, thanks!
John Carmack expands the display latency complaint here: http://superuser.com/questions/419070/transatlantic-ping-faster-than-sending-a-pixel-to-the-screen
Edwin, great link — there’s no substitute for measuring real systems. I used to work on latency and it’s really hard.