Position-based Chunking Leads to Poor Performance in RAGs


How to implement semantic chunking and gain better results.

Towards Data Science
Photo by vackground.com on Unsplash

Neighbors could still be different.

Language models come with a context limit. For newer OpenAI models, this is around 128k tokens, roughly 80k English words. This may sound big enough for most use cases. Still, large production-grade applications often need to refer to more than 80k words, not to mention images, tables, and other unstructured information.

Even if we pack everything within the context window with more irrelevant information, LLM performance drops significantly.

This is where RAG helps. RAG retrieves the relevant information from an embedded source and passes it as context to the LLM. To retrieve the ‘relevant information,’ we should have divided the documents into chunks. Thus, chunking plays a vital role in a RAG pipeline.

Chunking helps the RAG retrieve specific pieces of a large document. However, small changes in the chunking strategy can significantly impact the responses LLM makes.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here