Yes, agreed! We definitely got carried away with the marketing lingo and we apologize!
This was our thought process:
We have received a lot of negative feedback regarding this number so we want to explain the meaning and motivation. A single number can never characterize the performance of an architecture. The only thing that really matters is how many seconds and how many joules YOUR application consumes on a specific platform.
Still, we think multiplying the core frequency(700MHz) times the number of cores (64) is as good a metric as any. As a comparison point, the theoretical peak GFLOPS number often quoted for GPUs is really only reachable if you have an application with significant data parallelism and limited branching. Other numbers used in the past by processors include: peak GFLOPS, MIPS, Dhrystone scores, CoreMark scores, SPEC scores, Linpack scores, etc. Taken by themselves, datasheet specs mean very little. We have published all of our data and manuals and we hope it's clear what our architecture can do. If not, let us know how we can convince you.
As one of the HN contributors who mocked the GHz performance spec in an earlier discussion thread, I welcome the fact that you're engaging in debate about this.
That said, I still think that the GHz stat is just about as BAD a metric as any (I suppose "pin count times # of cores" would be worse :-). About the only positive inference I can draw from this is that you have the thermal situation in your system under control.
But piling up cores and cooling them is, IMHO, one of the easiest parts of designing a massively parallel system. The interesting part of the design is the interconnections between the cores, and any metric that multiplies single core performance by number of cores tells me nothing about that.
So not only am I not learning a key part of the performance characteristics of your system, but by omitting it, you make me wonder whether the ENGINEERING of the system might be similarly misguided on this aspect as the MARKETING seems to be (i.e. does marketing omit this aspect of the system because it was not important to the engineers either?).
Linpack at least has benchmarks both for showing off the cores in nearly independent scenarios, and for showing the system when actual communication has to take place. Obviously, each parallel application is different, but you'd at least show ONE indication of performance in situations that are not embarrassingly parallel (http://en.wikipedia.org/wiki/Embarrassingly_parallel).
Thanks for posting a very valid concern! Does the following FFT based image processing demo address the concern about communication bottlenecks in the approach?
Corner turns for 2D FFTs are usually quite challenging for GPUs and CPUs.[ref] Yaniv, our DSP guru, completed the corner turn part of the algorithm with ease in a couple of days and the on chip data movement constitutes a very small portion of the total application wall time.(complete with source code published as well if you really want to dig).
It's hard to market FFT cycle counts to the general audience:-)
Yes, this looks like a much more appealing argument to a distributed computing audience. As for marketing to "the general audience", it's not clear to me that this is a realistic aspiration.
I suspect that the people who would be happiest buying something like this are going to be very technical, not just USING Linpack, FFTs, neural networks, or HMMs on a regular basis, but used to IMPLEMENTING them as well. This audience is definitely going to want red meat like the paper you're linking to.
With the Kickstarter campaign, you may also get customers who just think it's cool to own a supercomputer, but when they realize they can't run Crysis on it, they may be disappointed.
This was our thought process:
We have received a lot of negative feedback regarding this number so we want to explain the meaning and motivation. A single number can never characterize the performance of an architecture. The only thing that really matters is how many seconds and how many joules YOUR application consumes on a specific platform.
Still, we think multiplying the core frequency(700MHz) times the number of cores (64) is as good a metric as any. As a comparison point, the theoretical peak GFLOPS number often quoted for GPUs is really only reachable if you have an application with significant data parallelism and limited branching. Other numbers used in the past by processors include: peak GFLOPS, MIPS, Dhrystone scores, CoreMark scores, SPEC scores, Linpack scores, etc. Taken by themselves, datasheet specs mean very little. We have published all of our data and manuals and we hope it's clear what our architecture can do. If not, let us know how we can convince you.