I picked up a raspberry Pi a few days ago. Initially, I was blown away by the low price point. Since then, I've been reflecting on what makes a computer useful.
For personal computers - desktops and laptops - I think we don't have a shortage of processor cycles. The minimal specs of the Raspberry Pi make it useable - 256MB of RAM, 700 MHz CPU, a few GB of storage and enough MB to saturate a home broadband connection. What is compelling about the best contemporary personal computing devices is form factor. How easy is it to provide input; how nice is the screen; if it is a mobile device, how heavy is it and does the battery last long enough, etc.
Does a personal parallel computer really help me? At first blush, I am having a hard time seeing how. Clearly, there are CPU intensive workloads that people have mentioned in this discussion - ray tracing is one. The video mentions robotics and algorithms. I have mixed feelings about that since I personally believe the future of robotics lies in computation off the physical robot itself - aka cloud robotics. A use case I personally would find beneficial is the ability to run dozens of VMs on the same machine. Heck ... each of my 50 open browser tabs could run inside separate VMs. I know light weight container technology is around for a while. e.g. jails, LXC. But what about hypervisor-based virtualization - e.g. VMWare, Xen, etc.? While the parallelization offered by this tech would be awesome, what seems to be missing is the ability to address lots and lots of memory.
As the majority of robotics research in the US is paid for by the military, I think there's more of a market for "fast computation on board" than you'd think. Communication and networking is expensive and hard. As a practicing roboticist, I'd love to work with a few of these. =)
The real value is in pushing forward a general compute device with many cores. Overall our programs are still stuck in the 1-2 thread era, and there is a bit of a chicken/egg problem. Without a very effective multicore processor, the payoff in writing parallel programs is small. GPGPU is still to expensive and not very practical due to memory constraints and the GPU/system memory bottleneck. This probably wont be the device to change all of that, but even failure is progress.
I used to feel exactly what you say. In 2006, when I first encountered the cell processor inside the PS3, my eyes popped. I found it extremely challenging to write useful software. The asymmetric architecture was a big culprit. I briefly looked into the dev environments offered by the likes of Rapid Mind but gave up. This didn't feel like general purpose computing.
Back in 06, I remember seeing fear in the eyes of some hardware and software engineers. In the next year, we were supposed to have 100 cores in our plain old desktops. How the heck are we going to program them? I found the situation a bit irrational. Every talk started with the death of Moore's Law because we couldn't shrink dies any further. More cores was posited as the only solution. Except, no one could code them for general purpose apps like Word, Excel, etc. In retrospect, I wonder why I don't have 100 cores in my desktop in 2012. I suspect because they aren't useful for average joe user.
P.S. Forgive my directionless rambling. I don't have a particularly strong opinion on this subject anymore.
Yeah, the average program will see no benefits - the one area which it will make a huge difference is games/graphics/simulation. There are so many problems in graphics that are fairly trivial to parallelize (especially with the rising interest in voxels and raytracing). The main thing is it needs to be very easy to program, and cross-platform (neither is an easy feat). Right now we are kind of stuck in a rut with current graphics APIs - you can do a lot with them, but they are also very limiting when compared to a general purpose CPU - back in the days of DOOM, the Build Engine, etc, one used to be able to write a rendering from the ground up; now (for better or worse) we are limited to one way of pushing polygons onto the screen, and there are a magnitude less of ways to be creative. Note that I used to be a huge GPU proponent, but after about a decade of working with them I am turning back to software rendering.
If by "expensive" you mean "time-consuming to implement algorithms on" then, yes, GPGPU is still expensive. The cards themselves, though, are stupidly cheap. For $140 you can get a card with 700+ general purpose shaders that run at around 800MHz.
Yes, they are relatively cheap on their own, the only problem is the added cost of a discrete card in addition to a cpu, and potentially a better power supply to support them both. In terms of many people's budgets it is nothing, but I am talking more about the average consumer. In the end, buying a desktop with a decent discrete GPU and CPU is probably going to run you at least $800; again, not a lot, but if this company could offer a competitive solution for $100, it could drive more "mainstream" adoption, although that is probably wishful thinking at this point.
Don't forget that nearly all Intel desktop CPUs come bundled with some kind of embedded GPU. It's nowhere near as powerful as a separate GPU chip, but it's still able to do OpenCL. The Intel HD4000 GPU may not be state of the art, but it out-performs the CPU for GPGPU-type operations.
This is an interesting project that deserves to reach it's funding goal but progress towards that is slow (I have been keeping an eye on it since launched on Kickstarter).
I suspect the problem is that it has no compelling (and immediate) "use case". If they could communicate a set of application ideas then I suspect that a whole new raft of supporters will be happy to risk at least $99.
Also their video is just mediocre. Very slow and very elevator-music like. I really want it to succeed, backed it already and got some more friends to do that too, but they have to do more as well. Fortunately these days they did make some progress by opening up the specs and more reward options.
Also, the $3million stretch goal is just waaaaay too far, too bad that the better design is floated for just that level.
I have backed this project. This is an interesting startup, with some good solid technology behind it. They have managed to design and tape out a chip with just a 2m dollar budget so far. The main draw of their architecture is not its peak, but rather its efficiency, both in terms of perf/watt and perf/die area. You can look at their manuals on the site.
Hoping their funding drive succeeds. I am liking the fact that ISA is being fully documented and we will have a fully open-source toolchain to work with the system.
(Disclaimer: Not associated with Adapteva in any way).
I'm also a backer and I've been completely surprised by the lack of interest. $99 to try what could represent the future of CPU design. I see it as a platform to really try out if the new wave of concurrent languages really make a difference on these platforms.
Again I don't know how (un)common that sort of thing is but I wasn't expecting to see 64 cores in that tiny form factor. Does anyone here know how cutting edge this thing is if at all?
[Edit]
Also does anyone here want to address use cases for this thing?
Well, NVIDIAs Kepler GPUs have 1536 cores on something like 320mm^2. I can't really find the die size of that adapteva product but I'd say it comes out at a similar range.
Having looked at the data a bit more: I like their specs concerning system balance. 100 GFLOPS over 6.4GB/s gives you a system balance of 15.625 FLOPS per memory access, that's about the same balance as a Westmere Xeon - pretty good for real world algorithms.
For comparison: NVIDIA Fermi has a system balance of about 20. Meaning: Fermi is sooner bounded by memory bandwidth, which is very often the limiting factor in real world computations.
One thing though: High Performance Computing is all about software / tooling support. If this company comes out with OpenCL in C (even better Fortran 90+) support, then we're talking.
Edit: By similar 'range' I meant core per mm^2 ratio.
Prepare to be surprised. The die size estimate for the Epiphany IV is 10mm-sq according to Adapteva. It is more appropriate to compare it to embedded GPUs than desktop GPUs in die size, power and performance.
For example, one particular embedded 40nm GPU design that I know about can deliver about 25 GFlops or so in the same die area.
Some of that GPU die area is used for graphics features that compute programs don't need. But a lot of it is providing performance even though it's not providing FLOPS. GPUs have caches and multithreading for a reason; if you could get better performance with an ultra-simple architecture then ATI/Nvidia would have done that already.
GPUs are primarily designed to be good for graphics, which implies completely different internal architecture. While GPUs have some graphics-oriented functional units the main factor is that all these cores have to access pretty large chunk of shared memory (textures, frame buffer...) and do that uniformly fast (and also support some weird addressing modes and access patterns). I suspect, that large part of die area of modern GPU is interconnect and that there really are few very wide cores (something like VLIW+SIMD+ possibly UltraSparcIV style hyperthreading, but that can be faked by compiler given sufficiently large register set) that are made to look like large amount of simple cores by magic in compiler (which seems consistent with CUDA programming model).
So: you can get large amounts of performance with simple architecture, but only for some problems, with graphics not being in set of these problems.
Sorry, but I have to correct a little bit here. Today's GPUs are
- not simple SIMD. NVIDIA calls it SIMT (single instruction multiple thread), mostly since you can branch a subset of them, so for the programmer it does feel somewhat like threads.
- not just optimized for Graphics anymore. E.g. since Fermi, the Tesla cards have DP performance = 50% of SP - which has been specifically introduced for HPC purposes. They have also constantly improved the schedulers to go more into general purpose computing, e.g. Kepler 2 seems to support arbitrary call graphs on the device. Again, that's useless for graphics.
- suitable for pretty much all stencil computations. Even for heavily bandwidth bounded problems GPUs are generally ahead of CPUs since they have very high memory bandwidth. The performance estimate I use for my master thesis comes out at 5x for Fermi over six core Westmere Xeon for bandwidth bounded and 7.5x for computationally bounded problems.
HPC is all about performance per dollar, performance per watt - and (sadly) sometimes linpack results because some institution wants to be in the top of some arbitrary list. In all of these aspects GPUs come out ahead of x86, which has been very dominant since the 90ies. Which is why GPUs are now in 4 of the top 20 systems - each of those are hundreds of millions of dollars in investments. That wouldn't be done if they weren't suitable for most computational problems.
My point is that GPUs have significantly different architecture from most of these "many cores on a chip" designs. Original reason for that was clearly that such architecture was necessary for graphics, coincidentally it works better for many interesting HPC workloads. It's clear that manufacturers are introducing technologies that are not required for graphics, but they cannot be expected to do modifications that will make their GPUs unusable for graphics.
And as for SIMD/SIMT, I mentioned SIMD mostly in relation to operations on short vectors done by one thread, which is mostly irrelevant to overall architecture of the core, as it can very well be implemented by pure combinational logic in one cycle given enough space. My mental model of how modern GPU core (physical, not logical) actually works is essentially some kind of simplistic RISC/VLIW design with large amounts of registers with compiler and or hardware interleaving instructions of multiple threads into one pipeline, which may or may not be how it actually works but it looks probable to me.
In my opinion most of chips like Epiphany IV or XMOS or whatever, in contrast to GPUs, are useful for only limited classes of workloads as they tend to be memory starved.
Ok, so that's 6.4 cores per mm^2 while Kepler has 4.8. Not bad, considering NVIDIA has already shrunk the scheduling ressources, register blocks and cache sizes per core to a bare minimum (something I don't agree with btw.).
Kepler has 8 "SMX" with 196 parallel sp threads each. For me the number of cores = the number of parallel threads, although on GPU they are not as independent, i.e. each "core" of an SMX either executes the same instruction on adjacent data or does nop. With dual issue do you mean a two stage pipeline or two threads in parallel, both performing FLOP?
"Dual issue" means "can issue two instructions per cycle", independent of pipeline depth or multithreading. In the case of the Epiphany, it can issue an ALU instruction and a floating point instruction each cycle.
Sounds pretty good, thanks for the heads up. Now I'm curious to see some benchmarks as soon as someone puts 20 or 30 of these on a board with lots of GDDR 3 Ram :).
OpenMP support would be interesting, and should be possible by extending what we did for the OpenCL support. The basic machinery is very similar. Also, someone mentioned Fortran. There are Fortran bindings for the STDCL API that is built on top of OpenCL, so this could help interface to existing Fortran codes and provide a partial solution for Fortran programmers.
Tilera did a very similar looking 64 cores on a chip in 2007, which is the oldest instance I know of off the top of my head. Their devices cost(or at least they used to) a few grand though. Tilera has bumped it up to around 100 per chip these days. I don't know anything about either architecture so it is hard to say if 64 1Ghz adapteva cores compares with 64 1.5Ghz Tilera cores.
So not quite cutting edge just an under explored side channel.
I haven't looked at the specs, yet, but this is what I've had in mind: Roll one out to help with deep packet inspection of some of my network traffic. Spam filtering might also be offloaded to one of these guys.
Dedicated machines to host backend applications -- SQL servers, Apache, nginx, etc.
You'll need this if you want energy-efficiency when solving parallelizable problems. Use one chip in energy-limited systems, like battery-powered robots. Use multiple chips in power/heat-limited systems, like supercomputers.
That first picture shows 4 cores made of 4 sub cores with 32 processing elements each. Now Nvidia would claim each of those 32 processing elements is a core, but each of those cores can not act independently. So it is more like a very wide, very hyper threaded 16 core processor.
I think NVIDIAs definition of a 'core' has some merit. First of all, they have some independency in that you can introduce branches over a subset of them, so they're not just SIMD vector units. Secondly, their threaded programming model is pretty well suited for many computational tasks. Executing the same operations over a whole 2D or 3D region of data is a pretty common thing in computing. If you can't parallelize your task that way, chances are it's not even parallelizeable on N x86 cores. If you compare this to x86 however, you'd have to count n Cores times the SSE vector length on each core to be fair. GPUs still come out ahead for most of heavy computational tasks though - which is why Intel is now fighting back with their Xeon Phi stuff (which sounds very promising btw., looking forward to play with our prerelease model that's coming soon ;) ).
Yes, but modern GPUs also use upwards of 100W of power, and their cores operate mostly in lock-step, which means they aren't fast for all kinds of tasks.
I agree that people tend to abuse the term, but I think it is a performance designation. It's just a sliding performance target. A supercomputer is a computer that can achieve the upper limits of what has been achieved in performance.
Sure, you're limited by money. But if someone magically was able to produce a machine for $100 that was on par with Titan (http://en.wikipedia.org/wiki/Titan_(supercomputer)) I would still call it a "supercomputer" until it became ubiquitous.
Except that someone would buy $10 million of such machines and gang them together. And we would call that a supercomputer, not whatever you put under your desk.
Defining it on any architecture or performance metric is just pointless because the march of time renders such things utterly moot. Remember when PlayStation 2s were "supercomputers"? Please.
Note that I said it was a sliding performance target. That is, I agree that defining it as any particular architecture or absolute performance is pointless. It's a designation relative to what is currently possible.
Relative performance is how people who design and use supercomputers define them. Cost is a secondary effect because if you're going to shoot for the limits of what we can do, it's going to be expensive.
The major difficulty with your second list is that expensive computers don't need to be high performance. Consider the computers that go into satellites and spacecraft. They are extremely expensive, but not high performing.
If they only had one, it doesn't matter how much it cost them to make, it's worth much much more than $100 dollars to someone. If there were enough supply that it were possible for it to be only $100 then it would have to already be ubiquitous. I agree with jacques_chester that it's an economic distinction.
Change "able to produce" with "willing to sell for" and my point remains the same.
I agree that you're not going to get a "supercomputer" for less than about $100,000. But supercomputers are defined by what they can do. Their cost is secondary. Necessary in a world without magic, but secondary. I can spend $100,000 on a computer, but that alone does not make it a "supercomputer".
(though I see the more informative "50 GFLOPS/Watt" below... and I like the prospect of something that would make it cheap to play with large scale real time neural nets...)
Yes, agreed! We definitely got carried away with the marketing lingo and we apologize!
This was our thought process:
We have received a lot of negative feedback regarding this number so we want to explain the meaning and motivation. A single number can never characterize the performance of an architecture. The only thing that really matters is how many seconds and how many joules YOUR application consumes on a specific platform.
Still, we think multiplying the core frequency(700MHz) times the number of cores (64) is as good a metric as any. As a comparison point, the theoretical peak GFLOPS number often quoted for GPUs is really only reachable if you have an application with significant data parallelism and limited branching. Other numbers used in the past by processors include: peak GFLOPS, MIPS, Dhrystone scores, CoreMark scores, SPEC scores, Linpack scores, etc. Taken by themselves, datasheet specs mean very little. We have published all of our data and manuals and we hope it's clear what our architecture can do. If not, let us know how we can convince you.
As one of the HN contributors who mocked the GHz performance spec in an earlier discussion thread, I welcome the fact that you're engaging in debate about this.
That said, I still think that the GHz stat is just about as BAD a metric as any (I suppose "pin count times # of cores" would be worse :-). About the only positive inference I can draw from this is that you have the thermal situation in your system under control.
But piling up cores and cooling them is, IMHO, one of the easiest parts of designing a massively parallel system. The interesting part of the design is the interconnections between the cores, and any metric that multiplies single core performance by number of cores tells me nothing about that.
So not only am I not learning a key part of the performance characteristics of your system, but by omitting it, you make me wonder whether the ENGINEERING of the system might be similarly misguided on this aspect as the MARKETING seems to be (i.e. does marketing omit this aspect of the system because it was not important to the engineers either?).
Linpack at least has benchmarks both for showing off the cores in nearly independent scenarios, and for showing the system when actual communication has to take place. Obviously, each parallel application is different, but you'd at least show ONE indication of performance in situations that are not embarrassingly parallel (http://en.wikipedia.org/wiki/Embarrassingly_parallel).
Thanks for posting a very valid concern! Does the following FFT based image processing demo address the concern about communication bottlenecks in the approach?
Corner turns for 2D FFTs are usually quite challenging for GPUs and CPUs.[ref] Yaniv, our DSP guru, completed the corner turn part of the algorithm with ease in a couple of days and the on chip data movement constitutes a very small portion of the total application wall time.(complete with source code published as well if you really want to dig).
It's hard to market FFT cycle counts to the general audience:-)
Yes, this looks like a much more appealing argument to a distributed computing audience. As for marketing to "the general audience", it's not clear to me that this is a realistic aspiration.
I suspect that the people who would be happiest buying something like this are going to be very technical, not just USING Linpack, FFTs, neural networks, or HMMs on a regular basis, but used to IMPLEMENTING them as well. This audience is definitely going to want red meat like the paper you're linking to.
With the Kickstarter campaign, you may also get customers who just think it's cool to own a supercomputer, but when they realize they can't run Crysis on it, they may be disappointed.
I would enjoy making a ray-tracing GPU from one of these.
That the cores don't run in lockstep can be shader heaven! I'm imagining using the cores in a pipeline with zoning so some core 'owns' some tile of the screen and does z-buffering, and other core does clipping of graphics primitives for each tile, and a sea of compute nodes between them chew up work and push it onwards.
Some kind of using the cores as a spatial index too. Passing rays to other cores as they propagate beyond the aabb belonging to a core.
Doubtless it wouldn't work like that. And wouldn't work well. But its fun thinking about it! :)
Parallel computing is limited by Amdalah's Law. Having more core does not mean you can have have more speed because it's not easy to use all those cores. Most imperial languages are not designed with running codes on multiple core and few programers are taught how to design their algorithm for using a handful of cores.
I can see this platform being a good tool for students and researchers to experiment with algorithm speedups by making their sequential code, parallel.
In my parallel programming class, our teacher had to rig together a computer lab to connect the 12 quad core computers to simulate a 64 core cluster. Then again, 64 core cluster of Parallella would cost like $7000. You can get the same 64 core setup by buying 8 x 8 core consumer desktop computer for under $3000, which will still be more cost effective and probably have ten times more computing power because of the x86 architecture.
It is a more powerful expression of the benefit of scaling with parallelism. Principally, instead of scaling speed with respect to a fixed data size, you scale the data size with respect to a fixed speed.
Having more cores means you (sometimes) can have more data. You still need those parallel programmers with their parallel algorithms though :-)
I'm of the opposite opinion. Companies that already have financial backing, have already put significant time and effort into a project, and already have experience running their business are much more likely to follow through on their campaign than some kid who build a new chair in his bedroom and thinks he can deliver it a month after his $100k campaign is finished.
Sure, Adapteva has raised ~$1.5mm in VC and another ~$850k in debt. Now they are raising $750k on Kickstarter, the "funding platform for creative projects." There's a big disconnect there. If well-funded companies like Adapteva are successful raising on Kickstarter, why wouldn't even bigger companies milk the Kickstarter sheep for R&D funds too? While it wouldn't violate the letter of Kickstarter rules for Intel to run a campaign like this, certainly it's not in the spirit.
And yes I get that it's open source blah blah blah, but this project is certainly part of the plan for an institutionally-funded business to make money. Adapteva is a .com, not a .org.
Separately: if Adapteva is only 8 months from delivering completed product to users, shouldn't they be able to raise more funds through traditional channels? They clearly have/had VC buy-in and can raise through institutional channels. If they are just finishing the final debugging/SDKs/etc. at this point, it's not a good sign that they can't raise another $750k from their existing backers to cover final launch costs.
I don't have a horse in this race, but it doesn't feel quite right to me.
Let's correct some of your assertions:
1.) Adapteva raised $1.5M from a small board business (not a VC) because it couldn't get a VC investor.
2.) A "well funded" semiconductor is one that takes in $100M like Calxeda. Adapteva has done "more with less" than any chip company in history.
3.) Adapteva is a chip company. The Parallella project is not an "R&D effort" it's about bringing the cost down for an open board product that the developers clearly want and that the industry needs.
4.) Adapteva has talked to >50 large institutional investors. Mostly they are either afraid of going up against Intel, Nvidia or they flat out don't invest in chips.
5.) Kickstarter is not just for non-profits.
First, I didn't mean to sound quite so negative on your project. I actually hope you succeed. If this is something the industry needs, I have no doubt you'll be a wild success.
I just take issue with raising money from unsophisticated unaccredited investors without even providing complete disclosure or binding contracts in return. I know a lot of companies do it, and I dislike it in those cases too. I also know I'm in the minority here and that it's only a matter of time before companies with huge VC backing & public companies are using Kickstarter to raise money. I think that's a bad thing, but others disagree.
Also, quickly:
- I'm not against for-profits using Kickstarter; most of the efforts there are for-profits. But companies that have raised millions of dollars probably should disclose that fact prominently in their campaigns.
- Similarly, the fact that you've been denied investment by >50 institutional investors is relevant in asking for money. It might be positive for some, negative for some. But it's likely not going to be a no-op for most.
- My figures come straight from Crunchbase, I'm not more connected than that.
I don't think a $2.35M round is really that much for a chip startup. The fact that they even think that they can plausibly launch a chip and some level of supporting documentation, software, etc for ~$3M total, that is a remarkably efficient use of funds. By the way, Kickstarter funds are "cheap" (no dilution & no debt) so anyone who can actually raise money on Kickstarter should do so.
I'm not qualified to evaluate how much money a modern chip startup needs to launch a product. But it's worth noting that the Kickstarter pitch doesn't mention the millions the company has raised so far (apologies if I missed that part).
>> Kickstarter funds are "cheap" (no dilution & no debt)
Of course they are, and that's kind of my point. Raising money from unsophisticated unaccredited investors without providing full disclosure or even a contract in exchange is obviously a great source of capital. However, I'm not convinced pitches like this would withstand scrutiny by the relevant regulators if they were not asleep at the switch.
I didn't mean to be overly critical of the project. I wish them all due success. That doesn't mean I can't dislike the Kickstarter campaign. (I would similarly dislike AMD raising funds on Kickstarter, even if I liked the project.)
Just like some startups raise VC for "validation" instead of money, companies are now using Kickstarter for PR. And if it's successful, people will pay you for your PR campaign.
Does anyone know what kind of cores these RISC cores will be? Will it be some lower end ARM version, or MIPS? Will it be something for which a wide array of tooling already exists, or will this have its own custom architecture which only works with their toolchain?
What's the point in having such a RAM/core ratio? By assigning 4 threads per core (which is fairly common to exploit manycore architectures) you don't even have 4Meg of memory per thread.
I would totally agree that memory constraint is sort of tied to manycore architectures, but in this case I find it pushed to the limits.
Since I thought I saw an Adapteva person posting here earlier:
If the Kickstarter falls through, what options could you still make available to hobbyists? Is there some version of your current prototype setup that you could sell, even if it's not one convenient board?
We would rather not think of that option:-) if the ks project fails, we'll do our best, but seems unlikely that we could support selling kits to hobbyists and they would certainly cost thousands of dollars each due to a lack of volume.
If you don't reach it in time, collect pre-orders. Seriously. Getting an escrow setup in place that in effect gives you a similar payment mechanism as Kickstarter (money handed over to you once the $750k is met; returned if criteria are not met) does not need to be expensive. Even without Escrow I think that if you get close to the target, a substantial number of those of us who've committed on Kickstarter will be ok with taking the risk. And it'd let you set longer/more flexible terms to make reaching it easier.
we got into a big discussion on super computers (the definition), the meaning of what a core is and a whole bunch of other issues... but the low power requirements of this are being completely ignored... as for applications... well portable and/or remote devices/sensors that need parallel computing capabilities and where high energy usage is prohibitive are possible applications. But the greatest asset of this is to spark the next gen of app developers and programmers to fully embrace parallel programming and truly make software scalable...
is it me or erlang would sort of fit nicely into the core's ideology of data processing? Seems that LD is like set constant. there are external STR commands. You can have data loaded into registers from the code - MOV. Not an expert in Erlang but it seem that two ideologies can beneficial to one another.
And if it is so, should expecting Erlang compiler be out of the question? :)
For personal computers - desktops and laptops - I think we don't have a shortage of processor cycles. The minimal specs of the Raspberry Pi make it useable - 256MB of RAM, 700 MHz CPU, a few GB of storage and enough MB to saturate a home broadband connection. What is compelling about the best contemporary personal computing devices is form factor. How easy is it to provide input; how nice is the screen; if it is a mobile device, how heavy is it and does the battery last long enough, etc.
Does a personal parallel computer really help me? At first blush, I am having a hard time seeing how. Clearly, there are CPU intensive workloads that people have mentioned in this discussion - ray tracing is one. The video mentions robotics and algorithms. I have mixed feelings about that since I personally believe the future of robotics lies in computation off the physical robot itself - aka cloud robotics. A use case I personally would find beneficial is the ability to run dozens of VMs on the same machine. Heck ... each of my 50 open browser tabs could run inside separate VMs. I know light weight container technology is around for a while. e.g. jails, LXC. But what about hypervisor-based virtualization - e.g. VMWare, Xen, etc.? While the parallelization offered by this tech would be awesome, what seems to be missing is the ability to address lots and lots of memory.