My point is that GPUs have significantly different architecture from most of these "many cores on a chip" designs. Original reason for that was clearly that such architecture was necessary for graphics, coincidentally it works better for many interesting HPC workloads. It's clear that manufacturers are introducing technologies that are not required for graphics, but they cannot be expected to do modifications that will make their GPUs unusable for graphics.
And as for SIMD/SIMT, I mentioned SIMD mostly in relation to operations on short vectors done by one thread, which is mostly irrelevant to overall architecture of the core, as it can very well be implemented by pure combinational logic in one cycle given enough space. My mental model of how modern GPU core (physical, not logical) actually works is essentially some kind of simplistic RISC/VLIW design with large amounts of registers with compiler and or hardware interleaving instructions of multiple threads into one pipeline, which may or may not be how it actually works but it looks probable to me.
In my opinion most of chips like Epiphany IV or XMOS or whatever, in contrast to GPUs, are useful for only limited classes of workloads as they tend to be memory starved.
And as for SIMD/SIMT, I mentioned SIMD mostly in relation to operations on short vectors done by one thread, which is mostly irrelevant to overall architecture of the core, as it can very well be implemented by pure combinational logic in one cycle given enough space. My mental model of how modern GPU core (physical, not logical) actually works is essentially some kind of simplistic RISC/VLIW design with large amounts of registers with compiler and or hardware interleaving instructions of multiple threads into one pipeline, which may or may not be how it actually works but it looks probable to me.
In my opinion most of chips like Epiphany IV or XMOS or whatever, in contrast to GPUs, are useful for only limited classes of workloads as they tend to be memory starved.