Getting Started with Building RAG Systems Using Haystack

Image by Author

Introduction

Large language models, or LLMs for short, have been a sensational topic in the past few months, with several LLMs being built and deployed daily. These models, built on a vast amount of data, have a remarkable ability to understand and generate human-like text, making them invaluable for various tasks and even complex decision-making.

They’ve eased previously daunting challenges, whether breaking language barriers with seamless translations or helping businesses deliver personalized customer support at lightning speed. Beyond just convenience, LLMs have opened doors to innovation in ways that were hard to imagine a decade ago.

However, LLMs are not all-knowing oracles. Their knowledge comes from vast amounts of text data used during training, but that data has its limits—both in scope and freshness. Imagine trying to fill an encyclopedia with everything about the world but stopping before today’s latest news. Naturally, gaps will emerge, leading to blind spots, and the LLM will struggle to provide meaningful answers.

These gaps are particularly noticeable when we ask questions that rely on current events, very specific domain expertise, or experiences outside the LLM’s dataset. Let’s take a look at this example, ChatGPT might stumble when asked about the latest advancements in AI post-2023. It’s not that the model is “ignorant” in the human sense; rather, it’s like asking someone who hasn’t read the newest book or studied a rare subject; they simply don’t have that information to draw from.

To solve this problem, LLMS needs up-to-date training and supplemental sources, like connected databases or search capabilities, to bridge these gaps. This led to the advent of Retrieval-Augmented Generation (RAG), a fascinating approach that combines the power of information retrieval with the creativity of generative models. At its core, RAG combines the power of pre-trained language models with a retrieval system to provide more accurate, context-aware answers.

Imagine you’re using a large language model to find details about a niche topic, but the model doesn’t have all the up-to-date knowledge you need. Instead of relying solely on its pre-trained knowledge, RAG retrieves relevant information from an external database or documents and then uses that to craft a well-informed, coherent response. This blend of retrieval and generation ensures that responses are not only creative but also grounded in real-time data or specific sources.

Do you know what’s more exciting about RAG? It has the potential to adapt in real-time. Instead of working with static knowledge, it can fetch the most up-to-date or specific information needed for a task from trusted sources while reducing the likelihood of hallucination.

There are three important stages in RAG, retrieval, augmentation, and generation. Let’s break these down one by one.

Retrieval Process

This is the first step where we dive into the search. Imagine you’re looking for answers in a vast library—before jumping to conclusions, you first need to gather the right books. In the RAG, this means querying a large database or knowledge base to fetch relevant pieces of information. These could be documents, articles, or any other form of data that can shed light on the question or task at hand. The goal here is to retrieve the most pertinent information that will help you build a solid foundation for the next steps.

Augmentation Process

Once we have the relevant information, it’s time to enhance it. This means refining the data we retrieved, reformatting it, or even combining multiple sources to form a more comprehensive and detailed answer. The retrieved data isn’t always perfect on its own, so it may need some tweaks or additional context to be truly useful.
The augmentation process can involve filtering out irrelevant information, merging facts, or rephrasing to make the data more digestible. This is where we start shaping the raw information into something more meaningful.

Generation Process

Now that we have the enhanced data, it’s time to generate the final output. With the augmented information, we use language models (like GPT) to craft a response or an output that directly answers the query or solves the problem. This step ensures the answer is coherent, human-like, and relevant.

In the upcoming sections of this article, we will get hands-on in building a RAG system using a very popular tool called Haystack.

What is Haystack?

Haystack, built by Deepset AI, is an open-source framework for building production-ready LLM applications, retrieval-augmented generative pipelines, and state-of-the-art search systems that work intelligently over large document collections.

Image by Author

With use cases spanning multimodal AI, conversational AI, content generation, agentic pipelines, and advanced RAG. Haystack is modularly designed so that you can mix the best technologies from OpenAI, Chroma, Marqo, and other open-source projects like Hugging Face’s Transformers or Elasticsearch.

Haystack’s cutting-edge LLMs and NLP models may be used to create personalized search experiences and allow your users to query in natural language. The recent release of Haystack 2.0 has brought a major update to the design of Haystack components, Document Stores, and pipelines.

Preparation

Prerequisites

Python 3.8 or higher
Haystack 2.0
OpenAI

Installation

We can install Haystack via either conda or pip.

Using pip:

Using conda:

conda config --add channels conda-forge/label/haystack-ai_rc
conda install haystack-ai

Import Libraries

Before diving into the code, necessary libraries and modules are imported. These include os for environment variables, Pipeline and PredefinedPipeline from Haystack to create and use pipelines, and urllib.request to handle file downloads.

import os
from haystack import Pipeline, PredefinedPipeline
import urllib.request

Set the API Key & Download the Data

In this step, the OpenAI API key is set as an environment variable, and a sample text file (containing information about Leonardo da Vinci) is downloaded to serve as input data for indexing.

os.environ["OPENAI_API_KEY"] sets up authentication for the LLM used in the pipeline
urllib.request.urlretrieve downloads the file davinci.txt from an online source and saves it locally

os.environ["OPENAI_API_KEY"] = "Your OpenAI API Key"
urllib.request.urlretrieve("https://archive.org/stream/leonardodavinci00brocrich/leonardodavinci00brocrich_djvu.txt","davinci.txt")

Creating Our RAG System

Create and Run an Indexing Pipeline

Here, a predefined indexing pipeline is created and executed. The indexing pipeline processes the davinci.txt file, making its content searchable for future queries.

Pipeline.from_template(PredefinedPipeline.INDEXING) initializes a pipeline for indexing data
.run(data={"sources": ["davinci.txt"]}) processes the input text file to index its content

indexing_pipeline =  Pipeline.from_template(PredefinedPipeline.INDEXING)
indexing_pipeline.run(data={"sources": ["davinci.txt"]})

Create the RAG Pipeline

This step initializes the RAG pipeline, which is designed to retrieve relevant information from the indexed data and generate a response using an LLM.

Query the Pipeline and Generate a Response

A query is passed to the RAG pipeline, which retrieves relevant information and generates a response.

query stores the question you want to answer
rag_pipeline.run(data={"prompt_builder": {"query":query}, "text_embedder": {"text": query}}) sends the query through the pipeline

prompt_builder specifies the query to be answered
text_embedder helps create embeddings for the input query

result["llm"]["replies"][0] extracts and prints the LLM-generated answer

query = "How old was he when he died?"
result = rag_pipeline.run(data={"prompt_builder": {"query":query}, "text_embedder": {"text": query}})
print(result["llm"]["replies"][0])

Full code:

import os

from haystack import Pipeline, PredefinedPipeline
import urllib.request

os.environ["OPENAI_API_KEY"] = "Your OpenAI Key"
urllib.request.urlretrieve("https://archive.org/stream/leonardodavinci00brocrich/leonardodavinci00brocrich_djvu.txt",
                          "davinci.txt") 

indexing_pipeline =  Pipeline.from_template(PredefinedPipeline.INDEXING)
indexing_pipeline.run(data={"sources": ["davinci.txt"]})

rag_pipeline =  Pipeline.from_template(PredefinedPipeline.RAG)

query = "How old was he when he died?"
result = rag_pipeline.run(data={"prompt_builder": {"query":query}, "text_embedder": {"text": query}})
print(result["llm"]["replies"][0])

Output:

Leonardo da Vinci was born in 1452 and died on May 2, 1519. Therefore, he was 66 years old when he passed away.

Conclusion

In this article, we explored step-by-step how to build a Retrieval-Augmented Generation (RAG) pipeline using Haystack. We started by importing essential libraries and setting up the environment, including the OpenAI API key for the language model integration. Next, we demonstrated how to download a text file containing information about Leonardo da Vinci, which served as the data source for our pipeline.

The walkthrough then covered the creation and execution of an indexing pipeline to process and store the text data, enabling it to be searched efficiently. We followed that with the setup of a RAG pipeline designed to combine retrieval and language generation seamlessly. Finally, we showed how to query the RAG pipeline with a question about Leonardo da Vinci’s age at the time of his death and retrieved the answer—66 years old.

This hands-on guide not only explained how the RAG process works but also walked you through practical steps to implement it.

Resources

Shittu Olumide is a software engineer and technical writer passionate about leveraging cutting-edge technologies to craft compelling narratives, with a keen eye for detail and a knack for simplifying complex concepts. You can also find Shittu on Twitter.

Getting Started with Building RAG Systems Using Haystack

Introduction

Retrieval Process

Augmentation Process

Generation Process

What is Haystack?

Preparation

Prerequisites

Installation

Import Libraries

Set the API Key & Download the Data

Creating Our RAG System

Create and Run an Indexing Pipeline

Create the RAG Pipeline

Query the Pipeline and Generate a Response

Conclusion

Resources

Recent Articles

Unlocking Cloud Efficiency: Optimized NUMA Resource Mapping for Virtualized Environments

India Proposes Digital Data Rules with Tough Penalties and Cybersecurity Requirements

The Importance Of Investing In Soft Skills In The Age Of AI

Bird Buddy’s new camera tracks plants and insects in your garden

Top 10 High-Paying AI Skills to Learn in 2025

Related Stories

Leave A Reply Cancel reply