Position-based Chunking Leads to Poor Performance in RAGs

How to implement semantic chunking and gain better results.

Neighbors could still be different.

Language models come with a context limit. For newer OpenAI models, this is around 128k tokens, roughly 80k English words. This may sound big enough for most use cases. Still, large production-grade applications often need to refer to more than 80k words, not to mention images, tables, and other unstructured information.

Even if we pack everything within the context window with more irrelevant information, LLM performance drops significantly.

This is where RAG helps. RAG retrieves the relevant information from an embedded source and passes it as context to the LLM. To retrieve the ‘relevant information,’ we should have divided the documents into chunks. Thus, chunking plays a vital role in a RAG pipeline.

Chunking helps the RAG retrieve specific pieces of a large document. However, small changes in the chunking strategy can significantly impact the responses LLM makes.

Position-based Chunking Leads to Poor Performance in RAGs

How to implement semantic chunking and gain better results.

Recent Articles

Rocksteady’s Suicide Squad Game Is Finished—Now What?

AI is Getting Smarter, But It Still Can’t Do My Data Science Job.

Python-Based Bots Exploiting PHP Servers Fuel Gambling Platform Proliferation

How to Use Pre-Trained Language Models for Regression | by Aden Haussmann | Jan, 2025

How victims of PowerSchool’s data breach helped each other investigate ‘massive’ hack

Related Stories

Leave A Reply Cancel reply