Korvus: Single-Query RAG with Postgres

levkk · 2024-07-11T16:35:31 1720715731

Hey fellow open-source enthusiasts,

We built Korvus, an open-source RAG (Retrieval-Augmented Generation) pipeline that consolidates the entire RAG workflow - from embedding generation to text generation - into a single SQL query, significantly reducing architectural complexity and latency.

Here's some of the highlights:

- Full RAG pipeline (embedding generation, vector search, reranking, and text generation) in one SQL query

- SDKs for Python, JavaScript, and Rust (more languages planned)

- Built on PostgreSQL, leveraging pgvector and pgml

- Open-source, with support for open models

- Designed for high performance and scalability

Korvus utilizes Postgres' advanced features to perform complex RAG operations natively within the database. We're also the developers of PostgresML, so we're big advocates of in-database machine learning. This approach eliminates the need for external services and API calls, potentially reducing latency by orders of magnitude compared to traditional microservice architectures. It's how our founding team built and scaled the ML platform at Instacart.

We're eager to get feedback from the community and welcome contributions. Check out our GitHub repo for more details, and feel free to hit us up in our Discord!

kaspermarstal · 2024-07-11T20:05:19 1720728319

Very cool! A assume you use Postgres' native full-text search capabilities? Any plans for BM25 or similar? This would make Korvus the end-game for open source rag IMO.

darby_nine · 2024-07-12T03:03:44 1720753424

How do you resolve the disparity between semantic and text search? Surely these rankings are difficult to combine.

whakim · 2024-07-12T04:03:49 1720757029

I’d start with something very simple such as Reciprocal Rank Fusion. I’d also want to make sure I really trusted the outputs of each search pipeline before worrying too much about the appropriate algorithm for combining the rankings.

mdaniel · 2024-07-11T20:15:06 1720728906

I find it misleading to use an f-string containing encoded `{CONTEXT}` <https://github.com/postgresml/korvus/blob/bce269a20a1dbea933...>, and after digging into TFM <https://postgresml.org/docs/open-source/korvus/guides/rag#si...> it seems it is not, in fact, an f-string artifact but rather the literal characters "{"+"CONTEXT"+"}" and are the same in all the language bindings?

IMHO it would be much clearer if you just used the normal %s for the "outer" string and left the implicit f-string syntax as it is, e.g.

                    {
                        "role": "user",
                        # this is not an f-string, is rather replaced by TODO FIXME
                        "content": "Given the context\n:{CONTEXT}\nAnswer the question: %s" % query,
                    },

The way the example (in both the readme and the docs) is written, it seems to imply I can put my own fileds as siblings to the chat key and they, too, will be resolved

    results = await collection.rag(
        {
            "EXAMPLE": {
              "uh-huh": True
            },
            "CONTEXT": {
                "vector_search": {
                    "query": {
                        "fields": {"text": {"query": query}},
                    },
                    "document": {"keys": ["id"]},
                    "limit": 1,
                },
                "aggregate": {"join": "\n"},
            },
            "chat": {
              "messages": [{"content": "Given Context:\n{CONTEXT}\nAn Example:\n{EXAMPLE}"
            }

One could not fault the user for thinking such a thing since the *API* docs say "see the *GUIDE*" :-( https://postgresml.org/docs/open-source/korvus/api/collectio...

smarvin2 · 2024-07-11T21:52:02 1720734722

This section of the docs may be confusing. What you described will actually almost work. See: https://postgresml.org/docs/open-source/korvus/guides/rag#ra...

hahahacorn · 2024-07-11T18:22:51 1720722171

Very cool. I see more languages planned in your comment. Are you looking for community help developing SDKs in other languages? After spending an entire Saturday running a RAG pipeline for a POC for a "fun" side project, I definitely would've loved to have been able to use this instead.

I spent too long reading Python docs because I haven't touched the language since 2019. Happy to help develop a Ruby SDK!

pqdbr · 2024-07-11T18:57:12 1720724232

+1 for a Ruby SDK!

smarvin2 · 2024-07-11T19:12:41 1720725161

We would love help developing a Ruby SDK! We programmatically generate our Python, JavaScript, and C bindings from our Rust library. Check out the rust-bridge folder for more info on how we do that.

simonw · 2024-07-11T20:27:14 1720729634

Does this work my running LLM such as Llama directly on the database server? If so, does that mean that your database and the LLM are competing for the same CPU and memory resources?

Can it run the LLM on a GPU?

smarvin2 · 2024-07-11T21:49:30 1720734570

It does work by running the LLM on the database server but you can configure the LLM to run on the GPU

ekianjo · 2024-07-12T10:44:53 1720781093

if you are using your database extensively how do you scale up your GPU resources for korvus?

wonderfuly · 2024-07-12T08:56:54 1720774614

I'm not sure if this is a good idea, just like pretending a network request is a function call, it hides a lot of elements that shouldn't be ignored. I still prefer to clearly explicit embedding, LLM generation, etc.

rahimnathwani · 2024-07-12T10:53:06 1720781586

> I'm not sure if this is a good idea, just like pretending a network request is a function call

This was my first reaction, too.

Perhaps there's something about data locality that makes it good for certain use cases?

> I still prefer to clearly explicit embedding, LLM generation, etc.

The bit that I usually need to control is how the retrieved results are formatted in the prompt. In order to make the context as information dense as possible, I might strip out certain words/l and/or symbols. But it depends on the query, so it can't be done at ingestion time.

omneity · 2024-07-11T23:08:00 1720739280

As a long time user of pgvector I'm really hyped up about this. Korvus has the potential to reduce a lot of the repetitive code in projects I work on.

You mention pulling models from huggingface for document embedding. Is it possible to pass an hf token to use private models?

I train domain and language-specific[0] embedding and conversational models and if I can use them in Korvus I'll most likely switch to it overnight.

[0]: https://sawalni.com/developers

nkmnz · 2024-07-11T21:12:56 1720732376

This sounds very promising, but let me ask an honest question: to me, it seems like databases are the hardest part to scale in your average IT infrastructure. How much work does it add to the database if you let it make all the ML related work as well? How much work is saved by reducing the number of necessary queries?

whakim · 2024-07-12T05:01:07 1720760467

Contrary to some of the sibling responses, my experience with pgvector specifically (with hundreds of millions or billions of vectors) is that the workload is quite different from your typical web-app workload, enough so that you really want them on separate databases. For example, you have to be really careful about how vacuum/autovacuum interacts with pgvector’s HNSW indices if you’re frequently updating data; you have to be aware that the tables and indices are huge and take up a ton of memory, which can have knock-on performance implications for other systems; etc.

CuriouslyC · 2024-07-11T21:14:45 1720732485

This is a read workload that can be easily horizontally scaled. The reduction in dev and infrastructure complexity is well worth the slight increase in DB provisioning.

montanalow · 2024-07-11T23:01:29 1720738889

Yep, one of our other projects, pgcat is exactly to help make the horizontal scaling as easy as possible.

https://github.com/postgresml/pgcat

nkmnz · 2024-07-12T09:35:57 1720776957

Running an LLM on the same server as your database is "a read workload that can be easily horizontally scaled"?

CuriouslyC · 2024-07-12T11:35:08 1720784108

You can use PL/Python to make API calls outside of the database, you just don't need a separate service to interact with the DB to orchestrate all your ML stuff, only endpoints.

haolez · 2024-07-12T12:34:06 1720787646

I was expecting to see something like a foreign table that managed the upload, chunking, embedding, everything in a transparent manner. But what I found in the examples was some Python code that look a lot like what the other frameworks are doing.

What am I missing? Honest question. I want to likes this :)

montanalow · 2024-07-12T14:01:59 1720792919

If you want to see all the SQL functions and tables Korvus depends on, check out the pgml extension.

https://postgresml.org/docs/open-source/pgml/

haolez · 2024-07-12T15:48:13 1720799293

But let's take splitting as an example. Does it happen in the Python part or the Postgres part? Is it a feature of the Python SDK or is it a feature of pgml? I couldn't understand this from the docs.

jiocrag · 2024-07-11T18:37:00 1720723020

Is there any way to deploy this to an existing postgres database or does it need to use the docker instance.

smarvin2 · 2024-07-11T19:13:09 1720725189

You can totally use an existing postgres database. Just make sure to install the pgvector and pgml postgres extensions and it will work!

naveen_k · 2024-07-11T18:26:50 1720722410

This looks exciting! Will definitely be testing it out in the coming days.

I see you offer re-ranking using local models, will there be build-in support for making re-ranking calls to external services such as cohere in the future?

smarvin2 · 2024-07-11T19:15:12 1720725312

Great question! Making calls to external services is not something we plan to support. The point of Korvus is to write SQL queries that take advantage of the pgml and pgvector extensions. Making calls to external services is something that could be done by users after retrieval.

iaabtpbtpnn · 2024-07-12T00:31:16 1720744276

You emphasize single-query, but I can't find the query. Where can I see it?

lecha · 2024-07-11T18:32:11 1720722731

Interesting! Is there a way to deploy this on AWS RDS?

klysm · 2024-07-11T18:56:09 1720724169

I'd imagine it just comes down to whether or not the extensions are allowed or not

smarvin2 · 2024-07-11T19:13:32 1720725212

Unfortunately the pgml extension does not work on AWS RDS so there is not.

thawkins · 2024-07-11T21:47:38 1720734458

What LLM system does it use to run models? Does it support ollama?

stavros · 2024-07-11T19:57:27 1720727847

This looks great, thanks! After being disappointed by how flaky gpt-4-turbo's RAG is, I want to set up my own, so this came at the right time.

One question: Can I use an external model (ie get the raw RAG snippets, or prompt text)? Or does it have to be the one specified in Korvus?

smarvin2 · 2024-07-11T21:48:46 1720734526

You can use Korvus for search and feed the results to an external model

unixhero · 2024-07-11T20:02:29 1720728149

Is RAG the new DAG? outoftheloop

helsinki · 2024-07-11T20:26:24 1720729584

Zero mention of what a RAG is on the README. No clue, here.

brylie · 2024-07-11T22:21:07 1720736467

Retrieval Augmented Generation uses text that is stored in a database to augment user prompts that are sent to a generative AI, like a large language model. The retrieval results are based on their similarities to the user input. The goal is to improve the output of the generative AI by providing more information in the input (user prompt + retrieval results.) For example, we can provide the LLM details from an internal knowledge base so it can generate responses that are specific to an organization rather than based on general information. It may also reduce errors and improve the relevancy of the model output, depending on the context.

jzig · 2024-07-11T21:04:42 1720731882

From the README:

> Korvus is an all-in-one, open-source RAG (Retrieval-Augmented Generation) pipeline...