Thanks for posting a very valid concern! Does the following FFT based image processing demo address the concern about communication bottlenecks in the approach?
Corner turns for 2D FFTs are usually quite challenging for GPUs and CPUs.[ref] Yaniv, our DSP guru, completed the corner turn part of the algorithm with ease in a couple of days and the on chip data movement constitutes a very small portion of the total application wall time.(complete with source code published as well if you really want to dig).
It's hard to market FFT cycle counts to the general audience:-)
Yes, this looks like a much more appealing argument to a distributed computing audience. As for marketing to "the general audience", it's not clear to me that this is a realistic aspiration.
I suspect that the people who would be happiest buying something like this are going to be very technical, not just USING Linpack, FFTs, neural networks, or HMMs on a regular basis, but used to IMPLEMENTING them as well. This audience is definitely going to want red meat like the paper you're linking to.
With the Kickstarter campaign, you may also get customers who just think it's cool to own a supercomputer, but when they realize they can't run Crysis on it, they may be disappointed.
http://www.adapteva.com/white-papers/using-a-scalable-parall...
Corner turns for 2D FFTs are usually quite challenging for GPUs and CPUs.[ref] Yaniv, our DSP guru, completed the corner turn part of the algorithm with ease in a couple of days and the on chip data movement constitutes a very small portion of the total application wall time.(complete with source code published as well if you really want to dig).
It's hard to market FFT cycle counts to the general audience:-)