Show HN: Open-source LLM provider price comparison

yujonglee · 2024-08-14T11:53:30 1723636410

Litellm maintains list of model infos here:

https://github.com/BerriAI/litellm/blob/main/model_prices_an...

Simple UI to search:

https://models.litellm.ai/?q=llama3

CuriouslyC · 2024-08-14T12:41:05 1723639265

Thanks, this saved me from scraping something.

aurareturn · 2024-08-14T13:52:22 1723643542

Amazing

alextttty · 2024-08-14T12:15:38 1723637738

Thank you! yeah great source. Do they track throughput for open source models? and inference engines Thats kind of data I want to find as well

Deathmax · 2024-08-14T15:46:45 1723650405

For throughput data, well, you need to actually run prompts to gather the data which racks up costs fast and performance can vary based on input prompt lengths. The two sources I use are OpenRouter's provider breakdown [1] and Unify's runtime benchmarks [2].

[1]: https://openrouter.ai/models/meta-llama/llama-3.1-70b-instru...

[2]: https://unify.ai/benchmarks/llama-3.1-70b-chat

amrrs · 2024-08-14T12:05:37 1723637137

The problem with these Open weights LLMs hosted by these provider is that we don't what's the precision of the LLM, that makes a huge difference in the speed and cost (compute).

I think Together recently introduced a different price tier based on precision but otherwise it is usually dark.

Whiteshadow12 · 2024-08-14T12:12:41 1723637561

Something Open Router get's close to that https://openrouter.ai/models/meta-llama/llama-3.1-405b-instr...

amrrs · 2024-08-14T12:30:08 1723638608

Amazing, just noticed the mention of precision

s-macke · 2024-08-14T15:09:32 1723648172

I have realized that in my benchmarks [1]. Llama 3 was significantly better than Llama 3.1, which was puzzling.

Then I realized, that I changed the provider. And the new one quantized Llama 3.1 with fp8.

Then I tried Hyperbolic [2], because they offer the model in different quantizations. As result, Llama 3.1 was better than Llama 3 or at least on par.

[1] https://github.com/s-macke/AdventureAI

[2] https://app.hyperbolic.xyz/models

alextttty · 2024-08-14T12:18:47 1723637927

Exactly, always best to rely on your hardware, we need to collect/add more data from self hosted models on different gpus/clouds to compare

edgoode · 2024-08-15T00:32:37 1723681957

We have 15+ clouds you can try on our platform if you're looking for a place to compare inference engines

Email me at ed at shadeform dot ai if we can help

Whiteshadow12 · 2024-08-14T11:58:40 1723636720

Nice I built something similar https://huggingface.co/spaces/Whiteshadow12/llm-pricing-calc...

I like your charting, many have taken this task and then lose interest.

similar other tools for inspiration https://llmprices.dev/ https://www.llmpricing.app/

What no one is doing is focusing on GPUs, what is the cost of running L3-8B on an A100 or H100 per second.

maeil · 2024-08-14T12:32:52 1723638772

Going to add my personal favorite: https://artificialanalysis.ai/models/llama-3-instruct-70b/pr...

What sets them apart is that they have speed and latency as well.

mcbetz · 2024-08-14T14:25:08 1723645508

Thanks for bringing llmprices.dev to my attention. I have also a comparison page for models hosted on OpenRouter (https://minthemiddle.github.io/openrouter-model-comparison/), I do comparison via regex (so "claude-3-haiku(?!:beta)|flash" will show you haiku, but not haiku-beta vs flash.

I wish that OpenRouter would also expose the amount of output tokens via API as this is also an important criteria.

alextttty · 2024-08-14T12:14:57 1723637697

Yeah we want to do exactly this, benchmark and add more data from differnt gpus/cloud providers, will appreciate your help a lot! There are many inference engines which can be tested and updated to find best inference methods

Whiteshadow12 · 2024-08-14T12:32:54 1723638774

Goodluck, companies would love that. Don't get depressed unlike my tool I think you should charge, that might keep you motivated to keep doing the work.

It's a lot of work, your target users is companies that use Runpod and AWS/GCP/Azure, not Fireworks and Together, they are in the game of selling tokens, you are selling the cost of running seconds on GPUs.

agcat · 2024-08-14T19:32:15 1723663935

This is true especially if you are deploying custom or fine-tuned models. Infact, for my company i also ran benchmark tests where we tested cold-starts, performance consistency, scalability, and cost-effectiveness for models like Llama2 7Bn & Stable Diffusion across different providers - https://www.inferless.com/learn/the-state-of-serverless-gpus... Can save months of evaluation time. Do give it a read.

P.S: I am from Inferless.

alextttty · 2024-08-14T12:43:28 1723639408

Thank you!

ebalit · 2024-08-14T13:10:21 1723641021

We built the cheapest Llama 3.1 70B inference API, specialized for tasks that are not time sensitive (ie. batch processing jobs for example).

Without any quantization our current price is 30cts ingest and 50cts output per million tokens. [1]

1: https://withexxa.com/#pricing

alextttty · 2024-08-14T13:27:37 1723642057

Amazing! Please dont hesitate to open an issue or a PR Will update our dataset and add it.

fsndz · 2024-08-14T14:06:02 1723644362

LLMs are really commodities now: https://www.lycee.ai/blog/why-large-language-models-are-a-co...

alextttty · 2024-08-14T14:32:35 1723645955

Indeed!

fsndz · 2024-08-14T15:24:58 1723649098

and it changes the dynamics of the generative AI space completely ! absolutely exciting to watch. I am bullish on generative AI even if I think scaling laws will generate diminishing returns going forward.

Kiro · 2024-08-14T12:33:53 1723638833

So overall not cheaper running these yourself compared to using GPT-4o mini.

KetoManx64 · 2024-08-14T14:53:53 1723647233

I don't think it ever has been. The reason you run them yourself is for the privacy and customizability

alextttty · 2024-08-14T12:44:56 1723639496

Think its just dominating the industry with cheaper access to gpu's probably even subsidised pricing

Ilasky · 2024-08-14T14:01:33 1723644093

Did something kind of similar (needs a little updating, but I find it still useful) for price vs performance on the LMSYS leaderboard.[0][1]

[0] https://chat.lmsys.org/?leaderboard

[1] https://llmcompare.net

Dowwie · 2024-08-14T15:32:54 1723649574

Everyone should know that every single model runs differently on every system and so to decide which is best literally requires you to go through the painstaking process of running inference with each provider and then deciding. The price of inference is not sufficient to decide where to run your models.

esafak · 2024-08-14T14:58:25 1723647505

Meta comment: he could make an Action that generates the images.

alextttty · 2024-08-14T15:01:46 1723647706

Made action to gen csvs + json Working on the mui data table frontend right now

namanyayg · 2024-08-14T11:31:29 1723635089

nice, literally what i need on my recent project that roasts HN profiles.

how about adding more models and providers?

and making a sortable table?

alextttty · 2024-08-14T12:18:24 1723637904

Honestly this is very very fresh, I was tinkering with hosting some models and wanted to optimize costs, tried few inference engines. Just want to collaborate on organizing data.

Agree, we will add a MUI table very soon. Also some charts.

I genuinely want someone to roast the way I did my benchmark process described there. Want something good enough yet easy to run.