Inference is still a lot faster on CUDA than on CPU. It's fine if you run it at ...

Inference is still a lot faster on CUDA than on CPU. It's fine if you run it at home or on your laptop for privacy, but if you're serving those models at any scale, you're going to be using GPUs with CUDA.

Inference is also a much smaller market right now, but will likely be overtaken later as we have more people using the models than competing to train the best one.