RAG: Hybrid Search Based on Two Indexes | by Jérôme DIAZ


The proposition I will be talking about in this article is something I already have implemented and I am currently testing in a personal project.

Towards Data Science

Hybrid search in the context of RAG and vector database means searching chunks of documents that can help answer a question using both a semantic search based on embeddings and a full text search on the content of those chunks.

The limitations

While hybrid search should give better results than a pure semantic approach as it should gives more relevant chunks of text by highlighting those that contains some keywords present in the research, there is still room for improvement.

As keywords are search in the same text that was used to calculate the embedding, what happen when a chunk A of a document contains the keywords while another chunk B of the same document is semantically close to the query and so should help to answer it?

We would like to have chunk B to be part of the documents returned by the retriever, but with a standard hybrid search that might not be the case.

Self-querying retriever

This kind of retriever is based on metadata filtering. Key information that might help to filter the vector…

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here