Inversions in Computing

Some computer things change very slowly; for example, my newish desktop at home has a PS/2 port. Other things change rapidly: my 2010 iPad is kind of a stone-age relic now. This kind of differential progress creates some funny inversions. A couple of historical examples:

Apparently at one point in the 80s or 90s (this isn’t a firsthand story– I’d appreciate recollections or citations) the processor available in an Apple printer was so fast that people would offload numerical computations to their printers.
I spent the summer of 1997 working for Myricom. Using the then-current Pentium Pro machines, you could move data between two computers faster than you could do a local memcpy(). I’m pretty sure there was something wrong with the chipset for these processors, causing especially poor memcpy() performance, but I’ve lost the details.

What are the modern examples? A few come to mind:

Non-volatile storage used to be really slow because disk reads involve waiting for moving parts. Huge increases in the speed of non-volatile storage have lead Intel to provide instruction set support for non-volatile memory.
I realize the example may not be 100% serious, but placing a filesystem in video RAM represents some kind of serious inversion.
A device that looks like a flash disk but includes a Linux machine that would have seemed pretty fast a few years ago.
Running C++ in the browser by compiling to asm.js.

Anyhow, I enjoy computing inversions since they challenge our assumptions.

December 14, 2014

regehr

Computer Science, Random

24 responses to “Inversions in Computing”

Zev Weiss says:

December 14, 2014 at 5:58 pm

I’m not certain of the technical accuracy of this (and unfortunately can’t find an appropriate reference at the moment), but I’ve heard it said that for a brief period (mid-90s I’d guess?) the fastest x86 processor in the world was an emulator running on an Alpha.
pdw says:

December 14, 2014 at 6:00 pm

http://en.wikipedia.org/wiki/LaserWriter#Hardware

The LaserWriter had a 12MHz 68000, which was faster than any Mac at the time. I’ve never heard of people using it for computations — unless you mean sending a PostScript program rather than prerendered data.
Graham Lee says:

December 15, 2014 at 3:29 am

Regarding your printer tale, I used to work at Oxford uni with a professor who had written a Fast Fourier Transform in postscript. It was indeed faster to send the program to the printer than to run it on one of the NeXT workstations (it was an HP printer; NeXT printers used the postscript interpreter in the workstation).
Ben says:

December 15, 2014 at 8:06 am

I’m not sure it fits your pattern exactly, but the specialize an OS down to a single application and run it in a VM thing seems funny to me.
regehr says:

December 15, 2014 at 9:28 am

Zev, that’s great– if you run across a reference I’d love to read it.
regehr says:

December 15, 2014 at 9:38 am

Graham and pdw, thanks for the details!
prasun says:

December 15, 2014 at 10:36 am

Sort of along these lines – Command queueing in hard drives was invented because hard drives were much slower but now it is useful because flash storage is much faster.
bcs says:

December 15, 2014 at 10:37 am

I’ve considered a few times what it would take to implement the US tax code in PostScript:

What do you use to do your taxes?
My printer.
Dan Luu says:

December 15, 2014 at 10:51 am

Here’s a reference to the Alpha claim: https://www.usenix.org/legacy/publications/library/proceedings/usenix-nt97/full_papers/chernoff/chernoff.pdf.

My recollection (which seems to be supported by the paper) is that, with binary translation, the Alpha was faster than a P6 in specfp, but worse in specint. It completely dominated both benchmarks when running native code, of course.
K Haddock says:

December 15, 2014 at 10:58 am

I think the Apple in question was an Apple II. The 6502 processor was quite weak, and there existed solutions such as external processor cards that would give the machine more power. One option was a card with the 68000 processor. I don’t know about printers, but it does not seem unplausible that there were some printer available for the Apple II that used the 68000, sometime around the mid 80s.
Dan Luu says:

December 15, 2014 at 11:07 am

Network latency is another inversion.

It’s intuitive to expect local operations to be faster than remote operations, but network latency has improved to the point that, with a bog standard network, you can do an RPC that’s faster than a spinning metal disk seek. With really fast networking (e.g., the stuff described here in http://blog.cloudflare.com/a-tour-inside-cloudflares-latest-generation-servers/), you might be able to do an RPC more quickly than an SSD seek.
yo says:

December 15, 2014 at 11:08 am

Of course inversions will always flip back.

Not too long ago we started offloading CPU to general GPU. now people are already starting to offload GPU ops back to cpu.

Thin/thick clients: Mainframes => PC => cloud

A/sync: interrupts => threads => events => co-routines. All on the same stack!
Stefan Bucur says:

December 15, 2014 at 12:55 pm

John Carmack is revolted about one, too: https://twitter.com/id_aa_carmack/status/193480622533120001
Jeroen Mostert says:

December 15, 2014 at 1:03 pm

@K.Haddock: The 6502 processor weak? Pfah. The C64 had a 6502 (well, a 6510, but let’s not quibble) and this RISC-like titan was so powerful that its disk drive was outfitted with, um, a second 6502 to handle all the complicated stuff.

That’s right, a C64 with disk drive was a dual core home computer before it was cool. Although I’ve never heard of people offloading non-disk calculations to the drive — this wouldn’t have been very practical, as the communication overhead would have probably killed most advantages.
Andreas says:

December 16, 2014 at 3:47 am

@Jeroen Mostert: I recall reading that at least one chess program offloaded parts of the chess program to the disk drive. According to a thread at http://www.lemon64.com/forum/viewtopic.php?t=27911&start=0 some demos seemed to have offloaded 3d-calculations to the disk drive as well.
Alexander Monakov says:

December 16, 2014 at 10:15 am

John, w.r.t.:

> Using the then-current Pentium Pro machines, you could move data between two computers faster than you could do a local memcpy(). Iâ€™m pretty sure there was something wrong with the chipset for these processors, causing especially poor memcpy() performance, but Iâ€™ve lost the details.

Consider that when you’re doing a memcpy, you have both read and write traffic in RAM, whereas when you’re sending it over the network you have only read traffic on one machine, and only write traffic on another. In other words, when you’re performing a non-local copy, you have twice as much memory bandwidth! And if the memory is slow while the network is extremely efficient and fast, that would explain it.
Jonathan Thornburg says:

December 16, 2014 at 10:58 am

The “Wheel of Reincarnation” has been around for a long time in graphics (co)processors [which tend to evolve in the direction of becoming smarter & faster, until they’re full-fledged computers, ready to have their own initially-dumb-but-later-smarter coprocesors]:

@Article{Myer:1968:DDP,
author = “T. H. Myer and I. E. Sutherland”,
title = “On the Design of Display Processors”,
journal = “Communications of the ACM”,
year = “1968”,
volume = “11”,
number = “6”,
month = jun,
pages = “410–414”,
}
regehr says:

December 16, 2014 at 4:12 pm

Alexander, definitely the case. I’ve just never seen this actually happen before (or since).
regehr says:

December 16, 2014 at 4:13 pm

Thanks Jonathan, that’s definitely a classic.
regehr says:

December 16, 2014 at 4:14 pm

Andreas, thanks for the link! Great to see people actually did this stuff.
regehr says:

December 16, 2014 at 4:15 pm

That is a great example Stefan. I wonder what platform he’s talking about? It seems surprising.
regehr says:

December 16, 2014 at 4:16 pm

Dan, great links, thanks!
Edwin says:

December 17, 2014 at 4:03 am

John Carmack expands the display latency complaint here: http://superuser.com/questions/419070/transatlantic-ping-faster-than-sending-a-pixel-to-the-screen
regehr says:

December 17, 2014 at 8:37 am

Edwin, great link — there’s no substitute for measuring real systems. I used to work on latency and it’s really hard.