ARM processors are ubiquitous in mobile, but you won’t find a lot of developers using ARM-based boxes for their day-to-day work. Recently I needed some ARM machines for compiler work and got a few Raspberry Pi 3 boards, which seemed reasonable since these have four cores at 1.2 GHz. However, while these boards are far faster than the original Raspberry Pi, they’re still very painful for development work; the SD card interface seems to be a major bottleneck and also I would imagine that the 512 KB last-level-cache is not holding enough of the working set of a developer workload to be all that effective.
The obviously desirable ARM chip for a developer box is a ThunderX2, with up to 54 cores at 3 GHz, but these don’t seem to be available at all until later in 2017 and who knows when someone will get around to packaging these up in an affordable, developer-friendly box. The original ThunderX is available for servers but I didn’t find an inexpensive boxed-up version (there’s the Gigabyte R270-T61 that’s hard to even get a price on). Various fastish Qualcomm chips are also available on development boards but not as complete systems, as far as I could tell. I’ve done my time bringing up software environments on random development boards and am no longer interested in that.
Anyway, I settled on the SoftIron OverDrive 1000, which has pretty good specs: 8 GB of DDR4 RAM, 1 TB disk, and an AMD A1120 processor, an implementation of ARM’s Cortex-A57 design. The cores run at 1.7 GHz and each pair of cores shares a 1 MB L2 cache, with 8 MB of L3 cache shared across all four cores. The box is headless but has a USB serial console, gigabit Ethernet, SATA, and USB interfaces available. At $600 the price is right. Here’s the documentation.
The OverDrive 1000 ships with 64-bit openSUSE installed. Though I hadn’t used its zypper package manager before, it is very easy and pleasant to interact with. My one gripe with the default configuration so far is that it ships with under a GB of swap space and when doing big compiles (and especially links, and even more especially links of debug builds) it’s easy to get zapped by the OOM killer. The disk is formatted with btrfs which does not support swapfiles. Their support was responsive but didn’t have a recipe for resizing the root partition without removing the hard drive and plugging it into a different machine, which I haven’t bothered to do yet.
Now let’s look at how fast this thing is. I’m not trying to be scientific here, just giving a general idea of what to expect from this box. Since I often sit around waiting for compilers, the benchmark is going to be LLVM 4.0 rc1 compiled using itself. The test input (program being compiled) is Crypto++ 5.6.5, which I chose since it compiles fairly quickly and doesn’t seem to have a lot of external dependencies. I compiled it with the -DCRYPTOPP_DISABLE_ASM flag to disable use of assembly language that might add compile-time differences across the platforms.
The processors I’m testing are just some that happened to be convenient. There’s a machine based on a Core i7-2600, a quad-core from 2011 running at 3.4 GHz, it has hyperthreading turned off. Another is based on an i7-6950X, a 10-core from 2016 running at 3.0 GHz, it has hyperthreading turned on. Finally there’s a Macbook Pro 2.2 GHz retina model from mid-2015, it has a Core i7-4770HQ, a quad-core, also with hyperthreading turned on.
The i7-2600 and the i7-6950X run hot: they are in the 100 W range. The i7-4770HQ is rated at 47 W but this includes the GPU. I’ll speculate that perhaps the power used by the CPU part is not that different from the 25 W used by the AMD A1120 (please leave a comment if you know more about this) (update: see this comment).
First, build time using one core (i.e. make -j1):
OverDrive 1000 | 390 s |
Macbook Pro | 177 s |
i7-2600 | 113 s |
i7-6950X | 139 s |
So the ARM chip is about 3.5x slower than the fastest of the Intel chips.
Second, compile times using 4 cores:
OverDrive 1000 | 137 s |
Macbook Pro | 57 s |
i7-2600 | 36 s |
i7-6950X | 41 s |
The OverDrive 1000 gets about a 2.8x speedup from four cores. Of course some of the non-linearity is due to sequential processing in the software build; when compiling other projects, I’ve seen more like a 3.5x speedup from using four cores.
Finally, compile times using all cores:
OverDrive 1000 | 137 s |
Macbook Pro | 49 s |
i7-2600 | 36 s |
i7-6950X | 19 s |
So here’s the worst case slowdown of the OverDrive 1000: it’s 7.2x slower than the big Intel chip, but it’s a pretty unfair comparison since that chip costs $1,600.
Overall, the OverDrive 1000 is an inexpensive and capable machine, but it isn’t going to compete performance-wise with Intel boxes at the same price point (for example, here are the PCs you can buy at NewEgg for between $500 and $600). If you buy an OverDrive 1000, buy it because you want an ARMv8 machine that shows up ready to use and that isn’t an embedded systems toy like the Raspberry Pi family.
7 responses to “A Quick Look at the SoftIron OverDrive 1000”
> I’ll speculate that perhaps the power used by the CPU part is not that different from the 25 W used by the AMD A1120
The per-component power limits aren’t static. If the GPU is idle and the CPU is active, the CPU can use basically all of the thermal budget of the whole package (and vice versa). So for the purposes of your benchmark, it was realistically a 47W CPU part.
I don’t have a fresh reference on this, but here’s one from 2012: http://www.intel.com/Assets/PDF/whitepaper/323324.pdf
Benchmarking code on this machine must be a lot easier than benchmarking on mobile devices.
Thanks Juho!
Daniel, yeah, this sort of a machine is a pleasure, and I didn’t even have to bother with the serial console since it booted properly the first time.
I know exactly where you’re coming from, but it still seems cruel to call a Pi3B a “toy”.
I’d love to bring one back to 1993 and say, here’s a credit-card sized computer that costs $21 (adjusted for inflation), draws 5W of power, has 1GB of RAM, would easily place in the top 90 of the Top500 supercomputers list… yet people dismiss it as a toy.
Vince, I just mean that the RPi machines are pretty unacceptable for development work. I agree that they’d have been dream machines not all that long ago.
yes, should have included a 😉
I spent winter break doing some Applesoft BASIC Apple II coding so my concept of a “toy” system got recalibrated a bit.
Your review is actually really valuable, I’ve been looking to get a real server-class ARM system for a while (I’ve been using a Jetson TX-1 for any high-end workloads).
I even have money set aside to get such a system. The problem is of course the purchasing department here. I just spent a year of constant fighting to get an IBM Power8 server, and that was from one of their pre-approved vendors. Their heads will probably explode if I try to get them to buy one of these ARM servers (and have to explain again why a low-end x86 desktop isn’t an acceptable substitute).