RAG App Development: From Start to Finish | by Shreya Sri | Aug, 2024

In our cozy office, the team is now ready to roll up their sleeves and dive into setting up their RAG-powered app. It’s time to prepare the workspace and gather the right tools for our project. Here’s how we do it:

Jupyter Notebook: The Interactive Kitchen

Think of Jupyter notebooks as an interactive kitchen, where we mix ingredients, test recipes, and troubleshoot any hiccups that come our way. They’re perfect for learning how to work with large language models (LLMs) because they let us experiment and see results in real-time. If something goes awry — unexpected output, an API hiccup, or any other tech hiccup — Jupyter notebooks are like our safety net, helping us understand and fix issues on the fly.

To get started, you’ll want to set up Jupyter notebooks. They’re the ideal environment for this guide and many others in our documentation. You can find detailed instructions on how to install Jupyter notebooks here.

Installation: Gathering Our Tools

Just like a chef needs the right utensils, our tutorial requires specific tools to make our RAG-powered app come to life. These tools are the LangChain dependencies, which are essential for our project.

Here’s a quick list of what you need to install:

LangChain dependencies: These are the key ingredients that will help us build and run our app smoothly.

As we gather these components, we’re setting the stage for a successful development process. With our interactive kitchen ready and our tools in place, we’re prepared to cook up something amazing with RAG technology.

Let’s get started on setting up and get ready to bring our app to life!

pip install langchain langchain_community langchain_chroma

For more details, check this Installation guide.

LangSmith🦜🛠️: Our Trusted Sous-Chef

As our team continues to cook up their RAG-powered app, they know that complex recipes require a careful eye on every step. Imagine LangSmith🦜🛠️ as our trusted sous-chef, helping us keep track of every detail and ensuring everything is cooking perfectly.

In the world of LangChain, many applications involve multiple steps and several calls to LLMs. Just like in a sophisticated kitchen, where each ingredient and step must be monitored closely to ensure a delicious outcome, our applications can become intricate with various processes and interactions.

LangSmith🦜🛠️ steps in as our sous-chef, allowing us to inspect and understand exactly what’s happening inside our application’s chain or agent. It helps us keep an eye on the details, track the flow of information, and troubleshoot any issues that arise.

By using LangSmith🦜🛠️, we gain the ability to see how our app is working under the hood, making it easier to manage and refine as it grows in complexity. With LangSmith🦜🛠️ on our team, we can ensure that every step of our RAG-powered app is perfectly executed, leading to a successful and reliable application.

Let’s continue our journey with LangSmith🦜🛠️ as our expert sous-chef, making sure our RAG-powered app is crafted to perfection!

In our cozy office, it’s time to set up our virtual kitchen by preparing the environment for our RAG-powered app. Think of this step as gathering the essential spices and tools before we start cooking.

Here’s how we prepare the environment:

import getpass
import os
# Enable LangChain tracing to monitor and debug our app
os.environ["LANGCHAIN_TRACING_V2"] = "true"
# Set up the API key for LangChain
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()

Enable LangChain Tracing: Just like setting the oven to the right temperature, we enable tracing to monitor and debug our application. Setting "LANGCHAIN_TRACING_V2" to "true" helps us keep an eye on what’s happening inside our app.
Set Up the API Key: To access LangChain’s features, we need an API key. Using getpass.getpass(), we securely input our API key without displaying it on the screen—like ensuring we have the right secret ingredient without revealing it to everyone.

With these steps, our virtual kitchen is prepped and ready. We’re all set to dive into the exciting world of building our RAG-powered app, with everything in place to ensure a smooth cooking process.

Indexing: Loading Our Data

In our cozy office, we’re getting ready to load the content for our RAG-powered app. This is like preparing our main ingredients before cooking. Here’s how we do it:

Loading the Blog Post Contents

To start, we need to load the contents of a blog post. We’ll use something called `DocumentLoaders`, which help us fetch data from a source and organize it into a list of `Documents`. Each `Document` includes the page content and some metadata.

For this task, we’ll use the `WebBaseLoader`, which is like a digital chef’s tool that grabs HTML from web URLs and converts it into text using BeautifulSoup. We can customize how the HTML is parsed by specifying parameters for BeautifulSoup. In our case, we only want to keep HTML tags with specific classes like “post-content”, “post-title”, or “post-header” and ignore everything else.

Here’s the code for loading the content:


import bs4
from langchain_community.document_loaders import WebBaseLoader
# Set up a filter to keep only relevant HTML tags
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
# Create the loader to fetch and parse the web content
loader = WebBaseLoader(
web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
bs_kwargs="parse_only": bs4_strainer,
)
# Load the documents
docs = loader.load()
# Check the length of the page content
len(docs[0].page_content)
# Print the first 500 characters of the page content to see what we have
print(docs[0].page_content[:500])

What This Does?

1. Filtering HTML: We use `bs4.SoupStrainer` to focus on HTML tags that contain the blog’s title, headers, and content. This is like selecting only the freshest ingredients for our dish.

2. Loading the Content: The `WebBaseLoader` fetches the HTML from the given URL and uses BeautifulSoup to parse it according to our filters. It then converts it into a format we can work with.

3.Previewing the Content: Finally, we check the length of the content and print the first 500 characters to get a glimpse of what we’re working with.

With our data loaded and ready, we’re all set to move on to the next step in building our RAG-powered app. Let’s continue preparing our ingredients and get ready to create something amazing!

Explore More:

DocumentLoader: Object that loads data from a source as list of Documents.

Docs: Detailed documentation on how to use DocumentLoaders.
Integrations: 160+ integrations to choose from.
Interface: API reference for the base interface.

As our team continues in the cozy office kitchen, we’re now ready to prepare our document for the final stages of cooking. Since our document is quite long — over 42,000 characters — it’s too hefty for most models to handle in one go. Think of it as trying to fit a whole roast into a single pan. Instead, we need to break it down into smaller, more manageable chunks.

Splitting the Document

To make sure our document is in the right size for our models, we’ll split it into chunks. This helps us manage large amounts of text and ensures that our models can efficiently process and retrieve relevant information.

Here’s how we do it:

from langchain_text_splitters import RecursiveCharacterTextSplitter# Set up the text splitter with chunk size and overlap
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, chunk_overlap=200, add_start_index=True
)
# Split the document into chunks
all_splits = text_splitter.split_documents(docs)
# Check the number of chunks created
len(all_splits)
# Check the length of the first chunk
len(all_splits[0].page_content)
# View metadata of a chunk
all_splits[10].metadata

What This Does

Splitting with Overlap: We use RecursiveCharacterTextSplitter to break the document into chunks of 1000 characters, with 200 characters of overlap between chunks. This overlap helps keep important context together and ensures we don’t lose any crucial information. It’s like cutting our ingredients into pieces that fit perfectly in our recipe.
Preserving Start Index: By setting add_start_index=True, we keep track of where each chunk starts within the original document. This helps us maintain context and reference points, much like keeping track of where each ingredient was added in our cooking process.
Checking the Results: We check the number of chunks created and review the length of one chunk to ensure everything is split correctly. We also look at the metadata to see where each chunk originated from.

With our document neatly sliced into manageable chunks, we’re ready for the next steps in our RAG-powered app development. Let’s continue to refine and build our application, ensuring it can handle and retrieve information efficiently!

Explore More:

TextSplitter: Object that splits a list of Documents into smaller chunks. Subclass of DocumentTransformers.

DocumentTransformer: Object that performs a transformation on a list of Document objects.

As our team continues in the cozy office kitchen, it’s time to store our prepared ingredients — our document chunks — so they’re ready for use when needed. This step is like organizing our prepped ingredients into storage containers, making it easy to grab what we need when it’s time to cook.

Storing the Chunks

To efficiently search and retrieve our document chunks at runtime, we’ll index them by embedding their contents and storing these embeddings in a vector database, or vector store. This process helps us quickly find and retrieve relevant chunks based on similarity to a search query.

Here’s how we set it up:

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings# Set up the vector store with our document splits and embeddings model
vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())

What This Does

Embedding Chunks: We use OpenAIEmbeddings to convert each document chunk into a high-dimensional vector representation, or embedding. This process is akin to turning our ingredients into a format that’s easy to store and retrieve.
Storing Embeddings: These embeddings are then stored in a vector store called Chroma. Think of Chroma as our pantry where all the prepped ingredients (embeddings) are neatly organized and ready to be accessed when needed.
Similarity Search: When we search, we embed the search query in the same way and perform a similarity search to find chunks with embeddings most similar to the query embedding. We use cosine similarity, which measures the angle between vectors to determine their similarity. It’s like finding ingredients in our pantry that best match our recipe requirements.

With our document chunks embedded and stored, we’re ready to efficiently search and retrieve relevant information at runtime. Let’s move on to the next step and continue building our smart and responsive RAG-powered app!

Explore More

Embeddings: Wrapper around a text embedding model, used for converting text to embeddings.

Docs: Detailed documentation on how to use embeddings.
Integrations: 30+ integrations to choose from.
Interface: API reference for the base interface.

VectorStore: Wrapper around a vector database, used for storing and querying embeddings.

Docs: Detailed documentation on how to use vector stores.
Integrations: 40+ integrations to choose from.
Interface: API reference for the base interface.

With our document chunks neatly stored in the vector store, it’s time to put our application into action. Imagine we’re at the heart of our cozy office kitchen, where we’re about to serve up the perfect dish based on a specific request. This step involves creating the logic to search through our stored chunks, retrieve the most relevant ones, and then generate a well-crafted response.

Retrieving Relevant Documents

We’ll start by defining the logic for searching through our indexed documents. LangChain provides a Retriever interface, which allows us to search and fetch relevant documents based on a user’s query.

Here’s how we set up the retrieval process:

# Set up the retriever to use similarity search
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs="k": 6)# Perform a search to find relevant documents
retrieved_docs = retriever.invoke("What are the approaches to Task Decomposition?")
# Check the number of documents retrieved
len(retrieved_docs)
# Print the content of the first retrieved document
print(retrieved_docs[0].page_content)

What This Does

Create a Retriever: We convert our Chroma vector store into a Retriever using the as_retriever() method. This allows us to use the vector store’s similarity search capabilities to find relevant documents. It’s like setting up our kitchen tools to find the best ingredients for our dish.
Perform a Search: We use the retriever.invoke() method to search for documents related to a specific query. For example, asking “What are the approaches to Task Decomposition?” is like placing an order for a particular dish.
Review Retrieved Documents: We check how many documents were retrieved and examine the content of the first one. This helps us understand what relevant information was found and how well it matches the query.

With this retrieval process in place, we’re ready to move on to the next step — generating responses based on the retrieved documents. Our application is now set to handle user queries efficiently and provide meaningful answers!

Explore More:

With our documents now indexed and retrievable, it’s time to combine all the pieces into a smooth, working chain that can take a user’s question, pull relevant information, and generate a coherent response. Let’s think of it as putting together a final dish after gathering all our ingredients.

Setting Up the Model

We’ll use the gpt-3.5-turbo OpenAI model to generate our answers, but you can choose from various language models depending on your needs. Here’s how to set it up:

import getpass
import os# Setting the API key for OpenAI
os.environ["OPENAI_API_KEY"] = getpass.getpass()
from langchain_openai import ChatOpenAI
# Initialize the chat model
llm = ChatOpenAI(model="gpt-4o-mini")

Creating the Prompt

We’ll use a pre-defined prompt from the LangChain prompt hub. This prompt guides the model on how to use the retrieved context to answer the question. It’s like setting up our recipe to ensure the dish turns out just right.

from langchain import hub
# Pull the prompt template from the hub
prompt = hub.pull("rlm/rag-prompt")
# Test the prompt with example inputs
example_messages = prompt.invoke(
"context": "filler context", "question": "filler question"
).to_messages()
print(example_messages[0].content)

Building the Chain

Now, we’ll put everything together in a chain. This chain will take a question, retrieve the relevant documents, format them, generate a response, and parse the output. It’s like cooking where each step builds on the previous one to create a complete dish.

from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthroughdef format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
# Define the chain
rag_chain = (
 format_docs, "question": RunnablePassthrough()
| prompt
| llm
| StrOutputParser()
)
# Use the chain to answer a question
for chunk in rag_chain.stream("What is Task Decomposition?"):
print(chunk, end="", flush=True)

Understanding the Chain

Retrieve Documents: The retriever fetches the most relevant document chunks based on the user’s question. It’s like selecting the best ingredients for our recipe.
Format Documents: The format_docs function prepares the retrieved content into a format suitable for the model, ensuring all necessary details are included.
Generate Response: The model generates a response based on the formatted prompt. This is where the magic happens, turning raw data into a meaningful answer.
Parse Output: The StrOutputParser extracts the final answer from the model’s output, ready to be served to the user.

By following these steps, you now have a complete setup for a RAG application that can retrieve relevant information and generate precise answers based on user queries.

You’ve now built a robust Q&A application that efficiently retrieves and generates answers based on indexed data. But the world of RAG applications is rich with features and possibilities. Here’s how you can take your application to the next level:

1. Return Sources

Enhance transparency and trustworthiness by returning source documents along with answers. This feature allows users to see which documents were used to generate the answer, adding context and credibility.

Explore: Learn how to modify your existing chains to include source documents in the response. This can be particularly useful for applications where source verification is crucial.

2. Streaming

Improve user experience by streaming outputs and intermediate steps. Streaming allows users to see results as they are being processed, which can be especially useful for long or complex queries.

Explore: Implement streaming capabilities in your chain to provide real-time updates and feedback during the question-answering process.

3. Add Chat History

Incorporate chat history to make your application more interactive and context-aware. This feature allows the application to remember past interactions and provide more relevant responses based on the conversation’s history.

Explore: Integrate mechanisms to store and manage chat history, and modify your prompts to take past interactions into account.

4. Retrieval Conceptual Guide

Deepen your understanding of retrieval techniques by exploring high-level overviews of different retrieval strategies. This will help you optimize the retrieval process and choose the best approach for your application’s needs.

Explore: Study various retrieval techniques, such as sparse vs. dense retrieval, and their implications for performance and accuracy.

5. Build a Local RAG Application

For enhanced control and privacy, consider building a local RAG application using all local components. This approach allows you to deploy and run your application entirely on your own infrastructure, bypassing external dependencies.

Explore: Set up local instances of your data storage, retrieval, and generation components, and test your application in a fully local environment.

RAG App Development: From Start to Finish | by Shreya Sri | Aug, 2024

Jupyter Notebook: The Interactive Kitchen

Installation: Gathering Our Tools

LangSmith🦜🛠️: Our Trusted Sous-Chef

Indexing: Loading Our Data

Loading the Blog Post Contents

What This Does?

Explore More:

Splitting the Document

What This Does

Explore More:

Storing the Chunks

What This Does

Explore More

Retrieving Relevant Documents

What This Does

Explore More:

Setting Up the Model

Creating the Prompt

Building the Chain

Understanding the Chain

1. Return Sources

2. Streaming

3. Add Chat History

4. Retrieval Conceptual Guide

5. Build a Local RAG Application

Recent Articles

Governing ML lifecycle at scale: Best practices to set up cost and usage visibility of ML workloads in multi-account environments

The world’s biggest battery maker says Elon Musk’s 4680 cell ‘is going to fail’

ESET Research Podcast: Gamaredon

Outlier Detection Using Random Forest Regressors: Leveraging Algorithm Strengths to Your Advantage | by Michael Zakhary

Gradient Boosting | Towards Data Science

Related Stories

Leave A Reply Cancel reply