Image by Author
Introduction
Large language models, or LLMs for short, have been a sensational topic in the past few months, with several LLMs being built and deployed daily. These models, built on a vast amount of data, have a remarkable ability to understand and generate human-like text, making them invaluable for various tasks and even complex decision-making.
They’ve eased previously daunting challenges, whether breaking language barriers with seamless translations or helping businesses deliver personalized customer support at lightning speed. Beyond just convenience, LLMs have opened doors to innovation in ways that were hard to imagine a decade ago.
However, LLMs are not all-knowing oracles. Their knowledge comes from vast amounts of text data used during training, but that data has its limits—both in scope and freshness. Imagine trying to fill an encyclopedia with everything about the world but stopping before today’s latest news. Naturally, gaps will emerge, leading to blind spots, and the LLM will struggle to provide meaningful answers.
These gaps are particularly noticeable when we ask questions that rely on current events, very specific domain expertise, or experiences outside the LLM’s dataset. Let’s take a look at this example, ChatGPT might stumble when asked about the latest advancements in AI post-2023. It’s not that the model is “ignorant” in the human sense; rather, it’s like asking someone who hasn’t read the newest book or studied a rare subject; they simply don’t have that information to draw from.
To solve this problem, LLMS needs up-to-date training and supplemental sources, like connected databases or search capabilities, to bridge these gaps. This led to the advent of Retrieval-Augmented Generation (RAG), a fascinating approach that combines the power of information retrieval with the creativity of generative models. At its core, RAG combines the power of pre-trained language models with a retrieval system to provide more accurate, context-aware answers.
Imagine you’re using a large language model to find details about a niche topic, but the model doesn’t have all the up-to-date knowledge you need. Instead of relying solely on its pre-trained knowledge, RAG retrieves relevant information from an external database or documents and then uses that to craft a well-informed, coherent response. This blend of retrieval and generation ensures that responses are not only creative but also grounded in real-time data or specific sources.
Do you know what’s more exciting about RAG? It has the potential to adapt in real-time. Instead of working with static knowledge, it can fetch the most up-to-date or specific information needed for a task from trusted sources while reducing the likelihood of hallucination.
There are three important stages in RAG, retrieval, augmentation, and generation. Let’s break these down one by one.
Retrieval Process
This is the first step where we dive into the search. Imagine you’re looking for answers in a vast library—before jumping to conclusions, you first need to gather the right books. In the RAG, this means querying a large database or knowledge base to fetch relevant pieces of information. These could be documents, articles, or any other form of data that can shed light on the question or task at hand. The goal here is to retrieve the most pertinent information that will help you build a solid foundation for the next steps.
Augmentation Process
Once we have the relevant information, it’s time to enhance it. This means refining the data we retrieved, reformatting it, or even combining multiple sources to form a more comprehensive and detailed answer. The retrieved data isn’t always perfect on its own, so it may need some tweaks or additional context to be truly useful.
The augmentation process can involve filtering out irrelevant information, merging facts, or rephrasing to make the data more digestible. This is where we start shaping the raw information into something more meaningful.
Generation Process
Now that we have the enhanced data, it’s time to generate the final output. With the augmented information, we use language models (like GPT) to craft a response or an output that directly answers the query or solves the problem. This step ensures the answer is coherent, human-like, and relevant.
In the upcoming sections of this article, we will get hands-on in building a RAG system using a very popular tool called Haystack.
What is Haystack?
Haystack, built by Deepset AI, is an open-source framework for building production-ready LLM applications, retrieval-augmented generative pipelines, and state-of-the-art search systems that work intelligently over large document collections.
Image by Author
With use cases spanning multimodal AI, conversational AI, content generation, agentic pipelines, and advanced RAG. Haystack is modularly designed so that you can mix the best technologies from OpenAI, Chroma, Marqo, and other open-source projects like Hugging Face’s Transformers or Elasticsearch.
Haystack’s cutting-edge LLMs and NLP models may be used to create personalized search experiences and allow your users to query in natural language. The recent release of Haystack 2.0 has brought a major update to the design of Haystack components, Document Stores, and pipelines.
Preparation
Prerequisites
- Python 3.8 or higher
- Haystack 2.0
- OpenAI
Installation
We can install Haystack via either conda or pip.
Using pip:
Using conda:
conda config --add channels conda-forge/label/haystack-ai_rc
conda install haystack-ai
Import Libraries
Before diving into the code, necessary libraries and modules are imported. These include os
for environment variables, Pipeline
and PredefinedPipeline
from Haystack to create and use pipelines, and urllib.request
to handle file downloads.
import os
from haystack import Pipeline, PredefinedPipeline
import urllib.request
Set the API Key & Download the Data
In this step, the OpenAI API key is set as an environment variable, and a sample text file (containing information about Leonardo da Vinci) is downloaded to serve as input data for indexing.
os.environ["OPENAI_API_KEY"]
sets up authentication for the LLM used in the pipeline- urllib.request.urlretrieve downloads the file davinci.txt from an online source and saves it locally
os.environ["OPENAI_API_KEY"] = "Your OpenAI API Key"
urllib.request.urlretrieve("https://archive.org/stream/leonardodavinci00brocrich/leonardodavinci00brocrich_djvu.txt","davinci.txt")
Creating Our RAG System
Create and Run an Indexing Pipeline
Here, a predefined indexing pipeline is created and executed. The indexing pipeline processes the davinci.txt file, making its content searchable for future queries.
Pipeline.from_template(PredefinedPipeline.INDEXING)
initializes a pipeline for indexing data.run(data={"sources": ["davinci.txt"]})
processes the input text file to index its content
indexing_pipeline = Pipeline.from_template(PredefinedPipeline.INDEXING)
indexing_pipeline.run(data={"sources": ["davinci.txt"]})
Create the RAG Pipeline
This step initializes the RAG pipeline, which is designed to retrieve relevant information from the indexed data and generate a response using an LLM.
Query the Pipeline and Generate a Response
A query is passed to the RAG pipeline, which retrieves relevant information and generates a response.
query
stores the question you want to answerrag_pipeline.run(data={"prompt_builder": {"query":query}, "text_embedder": {"text": query}})
sends the query through the pipelineprompt_builder
specifies the query to be answeredtext_embedder
helps create embeddings for the input queryresult["llm"]["replies"][0]
extracts and prints the LLM-generated answer
query = "How old was he when he died?"
result = rag_pipeline.run(data={"prompt_builder": {"query":query}, "text_embedder": {"text": query}})
print(result["llm"]["replies"][0])
Full code:
import os
from haystack import Pipeline, PredefinedPipeline
import urllib.request
os.environ["OPENAI_API_KEY"] = "Your OpenAI Key"
urllib.request.urlretrieve("https://archive.org/stream/leonardodavinci00brocrich/leonardodavinci00brocrich_djvu.txt",
"davinci.txt")
indexing_pipeline = Pipeline.from_template(PredefinedPipeline.INDEXING)
indexing_pipeline.run(data={"sources": ["davinci.txt"]})
rag_pipeline = Pipeline.from_template(PredefinedPipeline.RAG)
query = "How old was he when he died?"
result = rag_pipeline.run(data={"prompt_builder": {"query":query}, "text_embedder": {"text": query}})
print(result["llm"]["replies"][0])
Output:
Leonardo da Vinci was born in 1452 and died on May 2, 1519. Therefore, he was 66 years old when he passed away.
Conclusion
In this article, we explored step-by-step how to build a Retrieval-Augmented Generation (RAG) pipeline using Haystack. We started by importing essential libraries and setting up the environment, including the OpenAI API key for the language model integration. Next, we demonstrated how to download a text file containing information about Leonardo da Vinci, which served as the data source for our pipeline.
The walkthrough then covered the creation and execution of an indexing pipeline to process and store the text data, enabling it to be searched efficiently. We followed that with the setup of a RAG pipeline designed to combine retrieval and language generation seamlessly. Finally, we showed how to query the RAG pipeline with a question about Leonardo da Vinci’s age at the time of his death and retrieved the answer—66 years old.
This hands-on guide not only explained how the RAG process works but also walked you through practical steps to implement it.
Resources
Shittu Olumide is a software engineer and technical writer passionate about leveraging cutting-edge technologies to craft compelling narratives, with a keen eye for detail and a knack for simplifying complex concepts. You can also find Shittu on Twitter.