Hacker News new | past | comments | ask | show | jobs | submit login

The problem with these Open weights LLMs hosted by these provider is that we don't what's the precision of the LLM, that makes a huge difference in the speed and cost (compute).

I think Together recently introduced a different price tier based on precision but otherwise it is usually dark.




Something Open Router get's close to that https://openrouter.ai/models/meta-llama/llama-3.1-405b-instr...


Amazing, just noticed the mention of precision


I have realized that in my benchmarks [1]. Llama 3 was significantly better than Llama 3.1, which was puzzling.

Then I realized, that I changed the provider. And the new one quantized Llama 3.1 with fp8.

Then I tried Hyperbolic [2], because they offer the model in different quantizations. As result, Llama 3.1 was better than Llama 3 or at least on par.

[1] https://github.com/s-macke/AdventureAI

[2] https://app.hyperbolic.xyz/models


Exactly, always best to rely on your hardware, we need to collect/add more data from self hosted models on different gpus/clouds to compare


We have 15+ clouds you can try on our platform if you're looking for a place to compare inference engines

Email me at ed at shadeform dot ai if we can help




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: