It' s a bit early to compare directly to TensorRT because we don't have a full-blown equivalent.
Note that our focus is being platform agnostic, easy to deploy/integrate, good performance all-around, and ease of tweaking.
We are using the same compiler than Jax, so our performances are on par.
But generally we believe we can gain on overall "tok/s/$" by having shorter startup time, choosing the most efficient hardware available, and easily implementing new tricks like multi-token prediction.
Note that our focus is being platform agnostic, easy to deploy/integrate, good performance all-around, and ease of tweaking. We are using the same compiler than Jax, so our performances are on par. But generally we believe we can gain on overall "tok/s/$" by having shorter startup time, choosing the most efficient hardware available, and easily implementing new tricks like multi-token prediction.