Generative AI and large language models (LLMs) are revolutionizing organizations across diverse sectors to enhance customer experience, which traditionally would take years to make progress. Every organization has data stored in data stores, either on premises or in cloud providers.
You can embrace generative AI and enhance customer experience by converting your existing data into an index on which generative AI can search. When you ask a question to an open source LLM, you get publicly available information as a response. Although this is helpful, generative AI can help you understand your data along with additional context from LLMs. This is achieved through Retrieval Augmented Generation (RAG).
RAG retrieves data from a preexisting knowledge base (your data), combines it with the LLM’s knowledge, and generates responses with more human-like language. However, in order for generative AI to understand your data, some amount of data preparation is required, which involves a big learning curve.
Amazon Aurora is a MySQL and PostgreSQL-compatible relational database built for the cloud. Aurora combines the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open source databases.
In this post, we walk you through how to convert your existing Aurora data into an index without needing data preparation for Amazon Kendra to perform data search and implement RAG that combines your data along with LLM knowledge to produce accurate responses.
Solution overview
In this solution, use your existing data as a data source (Aurora), create an intelligent search service by connecting and syncing your data source to Amazon Kendra search, and perform generative AI data search, which uses RAG to produce accurate responses by combining your data along with the LLM’s knowledge. For this post, we use Anthropic’s Claude on Amazon Bedrock as our LLM.
The following are the high-level steps for the solution:
The following diagram illustrates the solution architecture.
Prerequisites
To follow this post, the following prerequisites are required:
Create an Aurora PostgreSQL cluster
Run the following AWS CLI commands to create an Aurora PostgreSQL Serverless v2 cluster:
The following screenshot shows the created instance.
Ingest data to Aurora PostgreSQL-Compatible
Connect to the Aurora instance using the pgAdmin tool. Refer to Connecting to a DB instance running the PostgreSQL database engine for more information. To ingest your data, complete the following steps:
- Run the following PostgreSQL statements in pgAdmin to create the database, schema, and table:
- In your pgAdmin Aurora PostgreSQL connection, navigate to Databases, genai, Schemas, employees, Tables.
- Choose (right-click) Tables and choose PSQL Tool to open a PSQL client connection.
- Place the csv file under your pgAdmin location and run the following command:
- Run the following PSQL query to verify the number of records copied:
Create an Amazon Kendra index
The Amazon Kendra index holds the contents of your documents and is structured in a way to make the documents searchable. It has three index types:
- Generative AI Enterprise Edition index – Offers the highest accuracy for the Retrieve API operation and for RAG use cases (recommended)
- Enterprise Edition index – Provides semantic search capabilities and offers a high-availability service that is suitable for production workloads
- Developer Edition index – Provides semantic search capabilities for you to test your use cases
To create an Amazon Kendra index, complete the following steps:
- On the Amazon Kendra console, choose Indexes in the navigation pane.
- Choose Create an index.
- On the Specify index details page, provide the following information:
- For Index name, enter a name (for example,
genai-kendra-index
). - For IAM role, choose Create a new role (Recommended).
- For Role name, enter an IAM role name (for example,
genai-kendra
). Your role name will be prefixed withAmazonKendra-<region>-
(for example,AmazonKendra-us-east-2-genai-kendra
).
- For Index name, enter a name (for example,
- Choose Next.
- On the Add additional capacity page, select Developer edition (for this demo) and choose Next.
- On the Configure user access control page, provide the following information:
- Under Access control settings¸ select No.
- Under User-group expansion, select None.
- Choose Next.
- On the Review and create page, verify the details and choose Create.
It might take some time for the index to create. Check the list of indexes to watch the progress of creating your index. When the status of the index is ACTIVE, your index is ready to use.
Set up the Amazon Kendra Aurora PostgreSQL connector
Complete the following steps to set up your data source connector:
- On the Amazon Kendra console, choose Data sources in the navigation pane.
- Choose Add data source.
- Choose Aurora PostgreSQL connector as the data source type.
- On the Specify data source details page, provide the following information:
- For Data source name, enter a name (for example,
data_source_genai_kendra_postgresql
). - For Default language¸ choose English (en).
- Choose Next.
- For Data source name, enter a name (for example,
- On the Define access and security page, under Source, provide the following information:
- For Host, enter the host name of the PostgreSQL instance (
cvgupdj47zsh.us-east-2.rds.amazonaws.com
). - For Port, enter the port number of the PostgreSQL instance (
5432
). - For Instance, enter the database name of the PostgreSQL instance (
genai
).
- For Host, enter the host name of the PostgreSQL instance (
- Under Authentication, if you already have credentials stored in AWS Secrets Manager, choose it on the dropdown Otherwise, choose Create and add new secret.
- In the Create an AWS Secrets Manager secret pop-up window, provide the following information:
- For Secret name, enter a name (for example,
AmazonKendra-Aurora-PostgreSQL-genai-kendra-secret
). - For Data base user name, enter the name of your database user.
- For Password¸ enter the user password.
- For Secret name, enter a name (for example,
- Choose Add Secret.
- Under Configure VPC and security group, provide the following information:
- For Virtual Private Cloud, choose your virtual private cloud (VPC).
- For Subnet, choose your subnet.
- For VPC security groups, choose the VPC security group to allow access to your data source.
- Under IAM role¸ if you have an existing role, choose it on the dropdown menu. Otherwise, choose Create a new role.
- On the Configure sync settings page, under Sync scope, provide the following information:
- For SQL query, enter the SQL query and column values as follows:
select * from employees.amazon_review
. - For Primary key, enter the primary key column (
pk
). - For Title, enter the title column that provides the name of the document title within your database table (
reviews_title
). - For Body, enter the body column on which your Amazon Kendra search will happen (
reviews_text
).
- For SQL query, enter the SQL query and column values as follows:
- Under Sync node, select Full sync to convert the entire table data into a searchable index.
After the sync completes successfully, your Amazon Kendra index will contain the data from the specified Aurora PostgreSQL table. You can then use this index for intelligent search and RAG applications.
- Under Sync run schedule, choose Run on demand.
- Choose Next.
- On the Set field mappings page, leave the default settings and choose Next.
- Review your settings and choose Add data source.
Your data source will appear on the Data sources page after the data source has been created successfully.
Invoke the RAG application
The Amazon Kendra index sync can take minutes to hours depending on the volume of your data. When the sync completes without error, you are ready to develop your RAG solution in your preferred IDE. Complete the following steps:
- Configure your AWS credentials to allow Boto3 to interact with AWS services. You can do this by setting the
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
environment variables or by using the~/.aws/credentials
file: - Import LangChain and the necessary components:
- Create an instance of the LLM (Anthropic’s Claude):
- Create your prompt template, which provides instructions for the LLM:
- Initialize the
KendraRetriever
with your Amazon Kendra index ID by replacing theKendra_index_id
that you created earlier and the Amazon Kendra client: - Combine Anthropic’s Claude and the Amazon Kendra retriever into a RetrievalQA chain:
- Invoke the chain with your own query:
Clean up
To avoid incurring future charges, delete the resources you created as part of this post:
- Delete the Aurora DB cluster and DB instance.
- Delete the Amazon Kendra index.
Conclusion
In this post, we discussed how to convert your existing Aurora data into an Amazon Kendra index and implement a RAG-based solution for the data search. This solution drastically reduces the data preparation need for Amazon Kendra search. It also increases the speed of generative AI application development by reducing the learning curve behind data preparation.
Try out the solution, and if you have any comments or questions, leave them in the comments section.
About the Authors
Aravind Hariharaputran is a Data Consultant with the Professional Services team at Amazon Web Services. He is passionate about Data and AIML in general with extensive experience managing Database technologies .He helps customers transform legacy database and applications to Modern data platforms and generative AI applications. He enjoys spending time with family and playing cricket.
Ivan Cui is a Data Science Lead with AWS Professional Services, where he helps customers build and deploy solutions using ML and generative AI on AWS. He has worked with customers across diverse industries, including software, finance, pharmaceutical, healthcare, IoT, and entertainment and media. In his free time, he enjoys reading, spending time with his family, and traveling.