Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Open-source LLM provider price comparison (github.com/arc53)
125 points by shelar1423 7 months ago | hide | past | favorite | 32 comments
Looking for the cheapest place to deploy llama 3.1 model? Don't worry we have found it so you don't have to.




Thanks, this saved me from scraping something.


Amazing


Thank you! yeah great source. Do they track throughput for open source models? and inference engines Thats kind of data I want to find as well


For throughput data, well, you need to actually run prompts to gather the data which racks up costs fast and performance can vary based on input prompt lengths. The two sources I use are OpenRouter's provider breakdown [1] and Unify's runtime benchmarks [2].

[1]: https://openrouter.ai/models/meta-llama/llama-3.1-70b-instru...

[2]: https://unify.ai/benchmarks/llama-3.1-70b-chat


The problem with these Open weights LLMs hosted by these provider is that we don't what's the precision of the LLM, that makes a huge difference in the speed and cost (compute).

I think Together recently introduced a different price tier based on precision but otherwise it is usually dark.


Something Open Router get's close to that https://openrouter.ai/models/meta-llama/llama-3.1-405b-instr...


Amazing, just noticed the mention of precision


I have realized that in my benchmarks [1]. Llama 3 was significantly better than Llama 3.1, which was puzzling.

Then I realized, that I changed the provider. And the new one quantized Llama 3.1 with fp8.

Then I tried Hyperbolic [2], because they offer the model in different quantizations. As result, Llama 3.1 was better than Llama 3 or at least on par.

[1] https://github.com/s-macke/AdventureAI

[2] https://app.hyperbolic.xyz/models


Exactly, always best to rely on your hardware, we need to collect/add more data from self hosted models on different gpus/clouds to compare


We have 15+ clouds you can try on our platform if you're looking for a place to compare inference engines

Email me at ed at shadeform dot ai if we can help


Nice I built something similar https://huggingface.co/spaces/Whiteshadow12/llm-pricing-calc...

I like your charting, many have taken this task and then lose interest.

similar other tools for inspiration https://llmprices.dev/ https://www.llmpricing.app/

What no one is doing is focusing on GPUs, what is the cost of running L3-8B on an A100 or H100 per second.


Going to add my personal favorite: https://artificialanalysis.ai/models/llama-3-instruct-70b/pr...

What sets them apart is that they have speed and latency as well.


Thanks for bringing llmprices.dev to my attention. I have also a comparison page for models hosted on OpenRouter (https://minthemiddle.github.io/openrouter-model-comparison/), I do comparison via regex (so "claude-3-haiku(?!:beta)|flash" will show you haiku, but not haiku-beta vs flash.

I wish that OpenRouter would also expose the amount of output tokens via API as this is also an important criteria.


Yeah we want to do exactly this, benchmark and add more data from differnt gpus/cloud providers, will appreciate your help a lot! There are many inference engines which can be tested and updated to find best inference methods


Goodluck, companies would love that. Don't get depressed unlike my tool I think you should charge, that might keep you motivated to keep doing the work.

It's a lot of work, your target users is companies that use Runpod and AWS/GCP/Azure, not Fireworks and Together, they are in the game of selling tokens, you are selling the cost of running seconds on GPUs.


This is true especially if you are deploying custom or fine-tuned models. Infact, for my company i also ran benchmark tests where we tested cold-starts, performance consistency, scalability, and cost-effectiveness for models like Llama2 7Bn & Stable Diffusion across different providers - https://www.inferless.com/learn/the-state-of-serverless-gpus... Can save months of evaluation time. Do give it a read.

P.S: I am from Inferless.


Thank you!


We built the cheapest Llama 3.1 70B inference API, specialized for tasks that are not time sensitive (ie. batch processing jobs for example).

Without any quantization our current price is 30cts ingest and 50cts output per million tokens. [1]

1: https://withexxa.com/#pricing


Amazing! Please dont hesitate to open an issue or a PR Will update our dataset and add it.



Indeed!


and it changes the dynamics of the generative AI space completely ! absolutely exciting to watch. I am bullish on generative AI even if I think scaling laws will generate diminishing returns going forward.


So overall not cheaper running these yourself compared to using GPT-4o mini.


I don't think it ever has been. The reason you run them yourself is for the privacy and customizability


Think its just dominating the industry with cheaper access to gpu's probably even subsidised pricing


Did something kind of similar (needs a little updating, but I find it still useful) for price vs performance on the LMSYS leaderboard.[0][1]

[0] https://chat.lmsys.org/?leaderboard

[1] https://llmcompare.net


Everyone should know that every single model runs differently on every system and so to decide which is best literally requires you to go through the painstaking process of running inference with each provider and then deciding. The price of inference is not sufficient to decide where to run your models.


Meta comment: he could make an Action that generates the images.


Made action to gen csvs + json Working on the mui data table frontend right now


nice, literally what i need on my recent project that roasts HN profiles.

how about adding more models and providers?

and making a sortable table?


Honestly this is very very fresh, I was tinkering with hosting some models and wanted to optimize costs, tried few inference engines. Just want to collaborate on organizing data.

Agree, we will add a MUI table very soon. Also some charts.

I genuinely want someone to roast the way I did my benchmark process described there. Want something good enough yet easy to run.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: