Amazon Bedrock launches Session Management APIs for generative AI applications (Preview)

Amazon Bedrock announces the preview launch of Session Management APIs, a new capability that enables developers to simplify state and context management for generative AI applications built with popular open source frameworks such as LangGraph and LlamaIndex. Session Management APIs provide an out-of-the-box solution that enables developers to securely manage state and conversation context across multi-step generative AI workflows, alleviating the need to build, maintain, or scale custom backend solutions. In this post, we discuss the new Session Management APIs and how to handle session state in your generative AI applications.

By preserving session state between interactions, Session Management APIs enhance workflow continuity, enabling generative AI applications, such as virtual assistants and multi-agent research workflows, that require persistent context across extended interactions. Developers can use this capability to checkpoint workflow stages, save intermediate states, and resume tasks from points of failure or interruption. Additionally, they can pause and replay sessions and use detailed traces to debug and enhance their generative AI applications. By treating sessions as a first-class resource, this capability enables developers to enforce granular access control through AWS Identity and Access Management (IAM) and encrypt data using AWS Key Management Service (AWS KMS), making sure that data from different user sessions is securely isolated and supporting multi-tenant applications with strong privacy protections.

Building generative AI applications requires more than model API calls. Your applications must handle conversation history, user preferences, state tracking, and contextual shifts. As these applications grow in complexity, robust state management becomes crucial. Key reasons include:

Contextual coherence – Maintaining state makes sure that the application can track the flow of information, leading to more coherent and contextually relevant outputs.
User interaction tracking – In interactive applications, state management allows the system to remember user inputs and preferences, facilitating personalized experiences.
Resource optimization – Efficient state management helps in allocating computational resources effectively, making sure that the application runs smoothly without unnecessary redundancy.
Error handling and recovery – Developers can use this capability to checkpoint workflow stages, save intermediate states, and resume tasks from points of failure or interruption.

In this post, we discuss the new Session Management APIs and how to handle session state in your generative AI applications.

Background

State persistence in generative AI applications refers to the ability to maintain and recall information across multiple interactions. This is crucial for creating coherent and contextually relevant experiences. Some of the information that you might need to persist includes:

User information – Basic details about the user, such as ID, preferences, or history
Conversation history – A record of previous interactions within the current session
Context markers – Indicators of the current topic, intent, or stage in a multi-turn conversation
Application state – The current status of ongoing processes or workflows

Effective use of session attributes enables personalization by tailoring responses based on the ongoing conversation, continuity by allowing conversations to pick up where they left off even after interruptions, and complex task handling by managing multi-step processes or decision trees effectively. These capabilities enhance the user experience and the overall functionality of generative AI applications.

Challenges

Implementing robust state management in generative AI applications presents several interconnected challenges. The system must handle state persistence and retrieval in milliseconds to maintain fluid conversations. As traffic grows and contextual data expands, state management also needs to efficiently scale.

When you build your own state management system, you need to implement backend services and infrastructure that handle persistence, checkpointing, and retrieval operations. For this post, we consider LangGraph to discuss the concepts of short-term memory and available options. Short-term memory stores information within a single conversation thread, which is managed as part of the agent’s state and persisted using thread-scoped checkpoints. You can persist short-term memory in a database like PostgreSQL using either a synchronous or asynchronous connection. However, you need to set up the infrastructure, implement data governance, and enable security and monitoring.

Solution overview

The Session Management APIs in Amazon Bedrock offer a comprehensive solution that streamlines the development and deployment of generative AI applications by alleviating the need for custom infrastructure setup and maintenance. This capability not only minimizes the complexities of handling data persistence, retrieval, and checkpointing, but also provides enterprise-grade security features with built-in tenant isolation capabilities. You can offload the heavy lifting of managing state and context of your DIY generative AI solutions to Session Management APIs, while still using your preferred OSS tool. This will accelerate your path to deploy secure and scalable generative AI solutions.

The Session Management APIs also support human-in-the-loop scenarios, where manual intervention is required within automated workflows. Additionally, it provides comprehensive debugging and traceability features, maintaining detailed execution logs for troubleshooting and compliance purposes. The ability to quickly retrieve and analyze session data empowers developers to optimize their applications based on actual usage patterns and performance metrics.

To understand how Session Management APIs integrate with LangGraph applications, let’s look at the following high-level flow.

Example use case

To demonstrate the power and simplicity of Session Management APIs, let’s walk through a practical example of building a shoe shopping assistant. We will show how BedrockMemorySaver provides a custom checkpointing solution backed by the Session Management APIs. The complete code for this example is available in the AWS Samples GitHub repository.

First, let’s understand how Session Management APIs work with our application, as illustrated in the following diagram.

This process flow shows how each user interaction creates a new invocation in the session, maintains conversation context, and automatically persists state while the LangGraph application focuses on business logic. The seamless integration between these components enables sophisticated, stateful conversations without the complexity of managing infrastructure for state and context persistence.

Prerequisites

To follow along with this post, you need an AWS account with the appropriate permissions.

Set up the environment

We use the following code to set up the environment:

%pip install -U langgraph_checkpoint_aws 
 
import boto3
from langgraph_checkpoint_aws.saver import BedrockSessionSaver
 
# Configure Bedrock client
bedrock_client = boto3.client("bedrock-runtime", region_name="="<aws_region>")

Initialize the model

For our large language model (LLM), we Anthropic’s Claude 3 Sonnet on Amazon Bedrock:

from langchain_aws import ChatBedrockConverse
llm = ChatBedrockConverse(
    model="anthropic.claude-3-sonnet-20240229-v1:0",
    temperature=0,
    max_tokens=None,
    client=bedrock_client,
)

Implement tools

Our assistant needs tools to search the product database and manage the shopping cart. These tools can use the information saved in the user session:

from langchain_core.tools import tool
@tool
def search_shoes(preference):
    """Search for shoes based on user preferences and interests."""
return pass

Set up Session Management APIs

We use the following code to integrate the Session Management APIs:

# Initialize session saver
session_saver = BedrockSessionSaver(
    region_name="<aws_region>",
   )

# Compile graph with session management
graph = graph_builder.compile(checkpointer=session_saver)
 
# Create a new session
session_id = session_saver.session_client.client.create_session()[“sessionId”]

Run the conversation

Now we can run our stateful conversation:

config = {"configurable": {"thread_id": session_id}}

while True:
    user_input = input("User: ")
    if user_input.lower() in ["quit", "exit", "q"]:
        print("Goodbye!")
        break
    for event in graph.stream(
        {"messages": [("user", user_input)]}, 
        config
    ):
        for value in event.values():
            if isinstance(value["messages"][-1], BaseMessage):
                print("Assistant:", value["messages"][-1].content)

Access session history

You can quickly retrieve the conversation history using the graph instance:

for i in graph.get_state_history(config, limit=5):
    print(i)

Although it’s simple to access data using BedrockSessionSaver in LangGraph, there might be instances where you need to access session data directly—whether for auditing purposes or external processing. The Session Management APIs provide this functionality, though it’s important to note that the retrieved data is in serialized format. To work with this data meaningfully, you need to perform deserialization first:

# List all invocation steps
steps = client.list_invocation_steps(
    sessionIdentifier=session_id,
)
 
# Get specific step details
step_details = client.get_invocation_step(
    sessionIdentifier=session_id,
    invocationIdentifier="your-invocation-id",
    invocationStepId="your-step-id",
)

Replay and fork actions

You might want to analyze the steps to understand the reasoning, debug, or try out different paths. You can invoke the graph with a checkpoint to replay specific actions from that point:

config_replay = {
    "configurable": {
        "thread_id": session_id,
        "checkpoint_id": "<checkpoint_id>",
    }
}
for event in graph.stream(None, config_replay, stream_mode="values"):
    print(event)

The graph replays previously executed steps before the provided checkpoint_id and executes the steps after checkpoint_id.

You can also try forking to revisit an agent’s past actions and explore alternative paths within the graph:

config = {
    "configurable": {
        "thread_id": session_id,
        "checkpoint_id": "<checkpoint_id>",
    }
}
graph.update_state(config, {"state": "updated state"})

Human-in-the-loop

Human-in-the-loop (HITL) interaction patterns allow the graph to stop at specific steps and seek human approval before proceeding. This is important if you have to review specific tool calls. In LangGraph, breakpoints are built on checkpoints, which save the graph’s state after each node execution. You can use the Session Management APIs to effectively implement HITL in your graph.

This example demonstrates how Session Management APIs seamlessly integrate with LangGraph to create a stateful conversation that maintains context across interactions. The Session Management APIs handle the complexity of state persistence, allowing you to focus on building the conversation logic.

The complete code is available in the AWS Samples GitHub repository. Feel free to clone it and experiment with your own modifications.

Clean up

To avoid incurring ongoing charges, clean up the resources you created as part of this solution.

Considerations and best practices

When implementing the Session Management APIs, consider these key practices for optimal results:

Session lifecycle management – Plan your session lifecycles carefully, from creation to termination. Initialize sessions using CreateSession at the start of conversations and properly close them with EndSession when complete. This approach promotes efficient resource utilization and maintains clean state boundaries between interactions.
Security and compliance – For applications handling sensitive information, implement appropriate data protection measures using the Session Management APIs’ built-in security features. By default, AWS managed keys are used for session encryption. For additional security requirements, you can encrypt session data with a customer managed key. Use the service’s data retention and deletion capabilities to maintain compliance with relevant regulations while maintaining proper data governance.

Conclusion

The Session Management APIs in Amazon Bedrock offer a powerful solution for handling state in generative AI applications. By using this fully managed capability, developers can focus on creating innovative AI experiences without getting caught up in the complexities of infrastructure management. The seamless integration with LangGraph enhances its utility, allowing for rapid development and deployment of sophisticated, stateful AI applications.

As the field of generative AI continues to evolve, robust state management will become increasingly crucial. The Session Management APIs provide the scalability, security, and flexibility needed to help meet these growing demands, enabling developers to build more contextually aware, personalized, and reliable AI-powered applications.

By adopting the Session Management APIs, developers can accelerate their path to production, provide better user experiences through consistent and coherent interactions, and focus their efforts on the unique value propositions of their AI applications rather than the underlying infrastructure challenges.

Try out the Session Management APIs for your own use case, and share your feedback in the comments.

About the authors

Jagdeep Singh Soni is a Senior Partner Solutions Architect at AWS based in the Netherlands. He uses his passion for Generative AI to help customers and partners build GenAI applications using AWS services. Jagdeep has 15 years of experience in innovation, experience engineering, digital transformation, cloud architecture and ML applications.

Ishan Singh is a Generative AI Data Scientist at Amazon Web Services, where he helps customers build innovative and responsible generative AI solutions and products. With a strong background in AI/ML, Ishan specializes in building Generative AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring local bike trails, and spending time with his wife and dog, Beau.

Rupinder Grewal is a Tech Lead Gen AI Specialist. He enjoys playing tennis and biking on mountain trails.

Krishna Gourishetti is a Senior Software Engineer for the Bedrock Agents team in AWS. He is passionate about building scalable software solutions that solve customer problems. In his free time, Krishna loves to go on hikes.

Aniketh Manjunath is a Software Development Engineer at Amazon Bedrock. He is passionate about distributed machine learning systems. Outside of work, he enjoys hiking, watching movies, and playing cricket.

Sarthak Handa serves as a Principal Product Manager at Amazon Web Services (AWS) AI/ML in Seattle, Washington, where his primary focus is on developing AI services that facilitate advancements in the healthcare industry. Prior to his work at AWS, Sarthak spent several years as a startup founder, building technology solutions for the healthcare and disaster relief sectors.

Amazon Bedrock launches Session Management APIs for generative AI applications (Preview)

Background

Challenges

Solution overview

Example use case

Prerequisites

Set up the environment

Initialize the model

Implement tools

Set up Session Management APIs

Run the conversation

Access session history

Replay and fork actions

Human-in-the-loop

Clean up

Considerations and best practices

Conclusion

About the authors

Recent Articles

Data-Driven March Madness Predictions | Towards Data Science

Microsoft adds AI-powered deep research tools to Copilot

Design in Motion: The Animation Principles Behind Green Stack

The Ultimate AI/ML Roadmap For Beginners

Google Releases Chrome Patch for Exploit Used in Russian Espionage Attacks

Related Stories

Leave A Reply Cancel reply