This is a misleading title. Brian builds Supercomputer Models, not Supercomputers. It would be like if I had a model of a Ferrari I put together at home and wrote an article about how I built sports cars in my spare time.
Yes, these are awesome, full-featured models, but the differences between this and a supercomputer, which costs tens of millions of dollars, requires high-density power and cooling, features multi-dimensional, low-diameter networks, and contains hundreds of thousands to millions of compute cores is... quite vast.
I get what you are saying, but what is a supercomputer today is a pedestrian little home computer tomorrow.
The cylinder design he's using is inspired by the early Cray models. Cray 1 had a performance of 80 MFLOPS. Cray X-MP had a performance of 800 MFLOPS. The Cray 2 (which looked substantially different) reached 1.9 GFLOPS in 1985.
1993-1996, Numerical Wind Tunnel - a 140 CPU vector computer, was at the top most of the time. It reached it's all time peak at around 235.8 GFLOP/s
Even ASCII Red, which held the top spot until the end of the 20th century, only reached 1.3 TFLOPS.
So unlike if you built a model of a Ferrari at home, this thing actually substantially outperforms the fastest supercomputers up until the mid 90's.
This is a little bit of a straw man. As Jack Dongarra will happily tell you, an iPad can outperform some of the supercomputers from the beginning of the Top500 benchmark. You don't get to be a supercomputer today by beating supercomputers from two decades ago, that's not how technology works. I'm taking umbrage at the linkbait title because I'm a grumpy old man, not because I don't think this project isn't cool (and admirable!)
My point is that the "super computer" term in itself is quite meaningless. It's pretty much just saying "we thought this thing was fast when it came out".
And while this thing doesn't really meet that label at its present scale, it is conceptually far closer to those early supercomputers than what an iPad is, both in how it's structured, the parallel nature of it (16 ARM cores; 128 Epiphany cores), the shared memory (within each Parallella) etc..
So yes, you're being grumpy about a title where it takes about 10 seconds to figure out that this isn't actually about someone building stuff aiming for the Top500.
An iPad does resemble the supercomputers of yore, between its superscalar, vector processing (GPU), and multicore architecture.
The Parallella brings distributed-memory programming in, which is a very important development.
You and I strongly disagree on the meaningfulness or definition of the term supercomputer. Here's an easy definition: A supercomputer is any single, unified, computer system that is currently one of the fastest 500 in the world.
Another grumpy old man here. I disagree with your classification. By your analogy, you'd be building a classical Ferrari capable of the speeds of that classical Ferrari, not a model. The fact that it can't measure up to today's super computers does not mean that it isn't constructed along some of the same lines and shares a lot of traits with it. Back when Beowulf clusters first came into vogue all the 'real' supercomputer people were saying 'but that isn't a real supercomputer, you're using multiple CPU's' and we all know how that discussion ended.
Give the man a break, wait a couple of years and he'll give you a 4K core cpu supercomputer for little money, that needs to be encouraged, not talked down. It's early days.
I actually disagree with your definition of supercomputer, and liken it to any compute system or cluster with more power than an average enterprise server or workstation. After all, they really exist to solve problems which can't be solved by traditional computing.
I think the metric of it being one of the fastest 500 in the world is a little disingenuous to all of the other supercomputers out there.
Or, "Brian builds distributed-memory computers". Despite the misleading title, I think such a project would be a great tool for a parallel computing class. Such a machine would be roughly the same cost as a textbook and would be much more rewarding than running MPI on a multicore laptop/desktop.
Having read through the documents, I see that it is capable of ~200 GFLOPS. Obviously, this outshines my macbook performance of ~50GFLOPS, but it is substantially less than my workstations GPU, an Nvidia GTX 770, capable of ~3000 GFLOPS.
I suppose my question is why would I choose to build something like this over using a GPU?
Parallela offers something halfway between a multi-CPU cluster, differing in terms of memory access, and a GPU, differing in that each core is a real core and can be independently making calls, branching, etc.
Another point to note is that every GPU architecture is different and that some support a different degree of control flow parallelism vs. data parallelism.
Ultimately it depends on the kernel you're working with - if you're to a point where you've got strictly linear SIMD and you depend entirely on floating-point math throughput or memory bandwidth, it wouldn't make sense not to use a GPU instead.
Although if you like or have code suitable for a multicore architecture you would get better performance from a Xeon Phi or the forthcoming Knight's Landing from Intel, both of which run x86_64 Linux.
Between ARM, OpenPower and Intel's Phi successors this is becoming a hyper-competitive space. Interesting times ahead!
The Parallella is a testbed for their chips, to let people experiment with the platform. The real value is for embedded use (of the chips, not the whole current design), and as they scale up the chips - their roadmap is for chips with thousands of cores.
Compared with your GPU, there's also another important factor: These are entirely independent cores. For stuff that your GPU does well - that is, stuff where you're doing the exact same manipulation of a large number of values in parallel - it's likely to keep beating the epiphany chips. For stuff it doesn't do well, that is, stuff where you need indpendent threads of execution for each data stream, the Epiphany chips may become a better fit.
For the experience building it mainly I guess. The author also mentions machine learning and image processing. Doing multiple different tasks at once. Which I think is possible with the latest CUDA and Nvidia gear though.
It has 8 FPGAs on there too. Useful for highspeed IO and other tasks. Also, there are 8 HDMI outputs on there. Which would make this a pretty interesting video wall machine.
If it had the 64 core chips instead of the 16 core chips, this thing would be ~ 800 GFLOPS I guess.
Isn't the architecture of real world supercomputers essentially dependent on their expected load?
The very notion of a 'general purpose processing supercomputer' essentially conjures one of those modern data center visions: a large array of identical consumer-grade hardware, with its high price/performance ratio: accessioned, wired, tested, commissioned, allocated workloads, managed over time and finally decomissioned by a combination of carefully developed human procedures and highly automated processes?
For instance, nobody in their right mind would install an OS on every such node by hand: it has to be PXE or similar (can you boot root-on-iSCSI direct from BIOS these days?).
I'm curious how many fellow HN'ers out there are doing this.
A single 16 core Parallella can run with just passive cooling. It takes next to no cooling to bring both the ARM and Epiphany chips down to near room temperature.
While the 16 Parallella can run with just passive cooling, you still need to ensure proper airflow over the unit.
Units have been known to overheat with just passive cooling, and they even advise that you install a fan with the official case (that they sell on their store/provided to backers), even though there is nowhere to screw a fan in, etc.
That Netgear dumb switch? We've got 10-20 of those running 24/7 in user office spaces; they're all cool to the touch because they do practically nothing. Jam them into a tight, hidden space and safely forget about them forever.
The Zynq generates quite a bit, but the Epiphany very little. I've got a case kit for mine with a 0.4 watt fan, and that fan is total overkill for a single Parallella.
Your LEDs use 20W of power? I would immediately scrap those - they serve no real purpose and use an insane amount of power, compared to their computing counterparts.
Yes, these are awesome, full-featured models, but the differences between this and a supercomputer, which costs tens of millions of dollars, requires high-density power and cooling, features multi-dimensional, low-diameter networks, and contains hundreds of thousands to millions of compute cores is... quite vast.