Building an advanced local LLM RAG pipeline by combining dense embeddings with BM25
The basic Retrieval-Augmented Generation (RAG) pipeline uses an encoder model to search for similar documents when given a query.
This is also called semantic search because the encoder transforms text into a high-dimensional vector representation (called an embedding) in which semantically similar texts are close together.
Before we had Large Language Models (LLMs) to create these vector embeddings, the BM25 algorithm was a very popular search algorithm. BM25 focuses on important keywords and looks for exact matches in the available documents. This approach is called keyword search.
If you want to take your RAG pipeline to the next level, you might want to try hybrid search. Hybrid search combines the benefits of keyword search and semantic search to improve search quality.
In this article, we will cover the theory and implement all three search approaches in Python.
Table of Contents
· RAG Retrieval
∘ Keyword Search With BM25
∘ Semantic Search With Dense Embeddings
∘ Semantic Search or Hybrid Search?
∘ Hybrid Search
∘ Putting It All Together
·…