The large Language Model, or LLM, has revolutionized how people work. By helping users generate the answer from a text prompt, LLM can do many things, such as answering questions, summarizing, planning events, and more.
However, there are times when the output from LLM is not up to our standard. For example, the text generated could be thoroughly wrong and need further direction. This is where the LLM Output Parser could help.
By standardizing the output result with LangChain Output Parser, we can have some control over the output. So, how does it work? Let’s get into it.
Preparation
In this article, we would rely on the LangChain packages, so we need to install them in the environment. To do that, you can use the following code.
pip install langchain langchain_core langchain_community langchain_openai python–dotenv |
Also, we would use the OpenAI GPT model for text generation, so ensure that you have API access to them. You can get the API key from the OpenAI platform.
I would work in the Visual Studio Code IDE, but you could work in any preferred IDE. Create a file called .env
within your project folder and put the OpenAI API key inside. It should look like this.
OPENAI_API_KEY = sk–XXXXXXXXXX |
Once everything is ready, we will move on to the central part of the article.
Output Parser
We can use many types of output parsers from LangChain to standardize our LLM output. We would try several of them to understand the output parser better.
First, we would try Pydantic Parser. It’s an output parser that we could use to control and validate the output from the generated text. Let’s use them better with an example. Create a Python script in your IDE and then copy the code below to your script.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
from typing import List from dotenv import load_dotenv from langchain.output_parsers import PydanticOutputParser from langchain_core.prompts import PromptTemplate from langchain_core.pydantic_v1 import BaseModel, Field, validator from langchain_openai import ChatOpenAI
load_dotenv()
class MovieReview(BaseModel): title: str = Field(description=“The movie title”) year: int = Field(description=“The year of the movie was released”) genre: List[str] = Field(description=“The main genres of the movie”) rating: float = Field(description=“Rating out of 10”) summary: str = Field(description=“Brief summary of the movie plot”) review: str = Field(description=“Critical review of the movie”)
@validator(“year”) def valid_year(cls, val): if val 2025: raise ValueError(“Must a valid movie year”) return val
@validator(“rating”) def valid_rating(cls, val): if val 10: raise ValueError(“Rating must be between 0 and 10”) return val
parser = PydanticOutputParser(pydantic_object=MovieReview)
prompt = PromptTemplate( template=“Generate a movie review for the following movie:\nmovie_title\n\nformat_instructions”, input_variables=[“movie_title”], partial_variables=“format_instructions”: parser.get_format_instructions() )
model = ChatOpenAI(temperature=0)
chain = prompt | model | parser
movie_title = “The Matrix” review = chain.invoke(“movie_title”: movie_title) print(review) |
We initially imported the packages in the code above and loaded the OpenAI key with the load_dotenv
. After that, we create a class called MovieReview
which contains all the information output we want. The output would deliver output from the title, year, genre, rating, summary, and review. In each output, we define the description of the output we want.
From the output, we create a validator for the year and rating to ensure the result is not what we wanted. You can also add more validation mechanisms if required.
Then we create the prompt template that would accept our query input and the format it should be.
The last thing we do is create the model chain and pass the query to get our result. For note, the chain
variable above accepts structure using “|” which is a unique method in the LangChain.
Overall, the result is similar to below.
Output:
title=‘The Matrix’ year=1999 genre=[‘Action’, ‘Sci-Fi’] rating=9.0 summary=‘A computer hacker learns about the true nature of reality and his role in the war against its controllers.’ review=“The Matrix is a groundbreaking film that revolutionized the action genre with its innovative special effects and thought-provoking storyline. Keanu Reeves delivers a memorable performance as Neo, the chosen one who must navigate the simulated reality of the Matrix to save humanity. The film’s blend of martial arts, philosophy, and dystopian themes make it a must-watch for any movie enthusiast.” |
As you can see the output follows the format we want and the result passes our validation method.
Pedantic Parser is the standard Output Parser we can use. We can use the other Output Parser if we already have a specific format in mind. For example, we can use the CSV Parser if we want the result only in the comma-separated items.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
from dotenv import load_dotenv from langchain.output_parsers import CommaSeparatedListOutputParser from langchain_core.prompts import PromptTemplate from langchain_openai import ChatOpenAI
load_dotenv()
output_parser = CommaSeparatedListOutputParser()
format_instructions = output_parser.get_format_instructions() prompt = PromptTemplate( template=“List six subject.\nformat_instructions”, input_variables=[“subject”], partial_variables=“format_instructions”: format_instructions, )
model = ChatOpenAI(temperature=0)
chain = prompt | model | output_parser
print(chain.invoke(“subject”: “Programming Language”)) |
Output:
[‘Java’, ‘Python’, ‘C++’, ‘JavaScript’, ‘Ruby’, ‘Swift’] |
The result is a list with the values separated by the comma. You can expand the template in any way you like if the result is comma-separated.
It’s also possible to get the output in datetime format. By changing the code and prompt, we can expect the result we want.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
from dotenv import load_dotenv from langchain.output_parsers import DatetimeOutputParser from langchain_core.prompts import PromptTemplate from langchain_openai import ChatOpenAI
load_dotenv()
output_parser = DatetimeOutputParser()
format_instructions = output_parser.get_format_instructions() prompt = PromptTemplate( template=“”“Answer the users question:
question
format_instructions”“”, input_variables=[“question”], partial_variables=“format_instructions”: format_instructions, )
model = ChatOpenAI(temperature=0)
chain = prompt | model | output_parser
print(chain.invoke(“question”: “When is the Python Programming Language invented?”)) |
Output:
You can see that the result is in the datetime format.
That’s all about the LangChain LLM Output Parsers. You can visit their documentation to find the Output Parsers you require or use the Pydantic to structure it yourself.
Conclusion
In this article, we have learned about the LangChain Output Parser, which standardizes the generated text from LLM. We can use the Pydantic Parser to structure the LLM output and provide the result you want. There are many other Output Parsers from LangChain that could be suitable for your situation, such as the CSV parser and the Datetime parser.