> Qualcomm’s OpenCL runtime would unpredictably crash if a kernel ran for too long. Crash probability goes down if kernel runtimes stay well below a second. That’s why some of the graphs above are noisy. I wonder if Qualcomm’s Adreno 6xx command processor changes had something to do with it. They added a low priority compute queue, but I’m guessing OpenCL stuff doesn’t run as “low priority” because the screen will freeze if a kernel does manage to run for a while.
Very few (if any) mobile class GPUs actually support true preemption. Rather they are more like pseudo-cooperative, with suspend checks in between work units on the GPU. Desktop GPUs only got instruction-level preemption not that long ago - Nvidia first added it with Pascal (GTX 10xx), so mobile still lacking this isn't surprising. It's a big cost to pay for a relatively niche problem.
So the "crash" was probably a watchdog firing for failing to make forward progress at a sufficient rate and also why the screen would freeze. The smallest work unit was "too big" and so it never would yield to other tasks.
> Desktop GPUs only got instruction-level preemption not that long ago
And even then it's often limited to already scheduled shaders in the queue - things like the register files being statically allocated at task schedule time means you can't just "add" a task, and removing a task is expensive as you need to suspend everything, store off the (often pretty large) register state and any used shader local data (or similar), stop that task and deallocate the shared resources. It's avoided for good reason, and even if it's supported likely a rather untested buggy path.
If you run an infinite loop on even the latest Nvidia GPU (with enough instances to saturate the hardware) you can still get "hangs", as it ends up blocking things like composition until the driver kills the task. It's still nowhere near the experience CPU task preemption gives you.
Very few (if any) mobile class GPUs actually support true preemption. Rather they are more like pseudo-cooperative, with suspend checks in between work units on the GPU. Desktop GPUs only got instruction-level preemption not that long ago - Nvidia first added it with Pascal (GTX 10xx), so mobile still lacking this isn't surprising. It's a big cost to pay for a relatively niche problem.
So the "crash" was probably a watchdog firing for failing to make forward progress at a sufficient rate and also why the screen would freeze. The smallest work unit was "too big" and so it never would yield to other tasks.