In Part 1 of this tutorial series, we introduced AI Agents, autonomous programs that perform tasks, make decisions, and communicate with others.
In Part 2 of this tutorial series, we understood how to make the Agent try and retry until the task is completed through Iterations and Chains.
A single Agent can usually operate effectively using a tool, but it can be less effective when using many tools simultaneously. One way to tackle complicated tasks is through a “divide-and-conquer” approach: create a specialized Agent for each task and have them work together as a Multi-Agent System (MAS).
In a MAS, multiple agents collaborate to achieve common goals, often tackling challenges that are too difficult for a single Agent to handle alone. There are two main ways they can interact:
- Sequential flow – The Agents do their work in a specific order, one after the other. For example, Agent 1 finishes its task, and then Agent 2 uses the result to do its task. This is useful when tasks depend on each other and must be done step-by-step.
- Hierarchical flow – Usually, one higher-level Agent manages the whole process and gives instructions to lower level Agents which focus on specific tasks. This is useful when the final output requires some back-and-forth.
In this tutorial, I’m going to show how to build from scratch different types of Multi-Agent Systems, from simple to more advanced. I will present some useful Python code that can be easily applied in other similar cases (just copy, paste, run) and walk through every line of code with comments so that you can replicate this example (link to full code at the end of the article).
Setup
Please refer to Part 1 for the setup of Ollama and the main LLM.
import ollama
llm = "qwen2.5"
In this example, I will ask the model to process images, therefore I’m also going to need a Vision LLM. It is a specialized version of a Large Language Model that, integrating NLP with CV, is designed to understand visual inputs, such as images and videos, in addition to text.
Microsoft’s LLaVa is an efficient choice as it can also run without a GPU.
After the download is completed, you can move on to Python and start writing code. Let’s load an image so that we can try out the Vision LLM.
from matplotlib import image as pltimg, pyplot as plt
image_file = "draghi.jpeg"
plt.imshow(pltimg.imread(image_file))
plt.show()
In order to test the Vision LLM, you can just pass the image as an input:
import ollama
ollama.generate(model="llava",
prompt="describe the image",
images=[image_file])["response"]
Sequential Multi-Agent System
I shall build two Agents that will work in a sequential flow, one after the other, where the second takes the output of the first as an input, just like a Chain.
- The first Agent must process an image provided by the user and return a verbal description of what it sees.
- The second Agent will search the internet and try to understand where and when the picture was taken, based on the description provided by the first Agent.
Both Agents shall use one Tool each. The first Agent will have the Vision LLM as a Tool. Please remember that with Ollama, in order to use a Tool, the function must be described in a dictionary.
def process_image(path: str) -> str:
return ollama.generate(model="llava", prompt="describe the image", images=[path])["response"]
tool_process_image = {'type':'function', 'function':{
'name': 'process_image',
'description': 'Load an image for a given path and describe what you see',
'parameters': {'type': 'object',
'required': ['path'],
'properties': {
'path': {'type':'str', 'description':'the path of the image'},
}}}}
The second Agent should have a web-searching Tool. In the previous articles of this tutorial series, I showed how to leverage the DuckDuckGo package for searching the web. So, this time, we can use a new Tool: Wikipedia (pip install wikipedia==1.4.0
). You can directly use the original library or import the LangChain wrapper.
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
def search_wikipedia(query:str) -> str:
return WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper()).run(query)
tool_search_wikipedia = {'type':'function', 'function':{
'name': 'search_wikipedia',
'description': 'Search on Wikipedia by passing some keywords',
'parameters': {'type': 'object',
'required': ['query'],
'properties': {
'query': {'type':'str', 'description':'The input must be short keywords, not a long text'},
}}}}
## test
search_wikipedia(query="draghi")
First, you need to write a prompt to describe the task of each Agent (the more detailed, the better), and that will be the first message in the chat history with the LLM.
prompt = '''
You are a photographer that analyzes and describes images in details.
'''
messages_1 = [{"role":"system", "content":prompt}]
One important decision to make when building a MAS is whether the Agents should share the chat history or not. The management of chat history depends on the design and objectives of the system:
- Shared chat history – Agents have access to a common conversation log, allowing them to see what other Agents have said or done in previous interactions. This can enhance the collaboration and the understanding of the overall context.
- Separate chat history – Agents only have access to their own interactions, focusing only on their own communication. This design is typically used when independent decision-making is important.
I recommend keeping the chats separate unless it is necessary to do otherwise. LLMs might have a limited context window, so it’s better to make the history as lite as possible.
prompt = '''
You are a detective. You read the image description provided by the photographer, and you search Wikipedia to understand when and where the picture was taken.
'''
messages_2 = [{"role":"system", "content":prompt}]
For convenience, I shall use the function defined in the previous articles to process the model’s response.
def use_tool(agent_res:dict, dic_tools:dict) -> dict:
## use tool
if "tool_calls" in agent_res["message"].keys():
for tool in agent_res["message"]["tool_calls"]:
t_name, t_inputs = tool["function"]["name"], tool["function"]["arguments"]
if f := dic_tools.get(t_name):
### calling tool
print('🔧 >', f"x1b[1;31m{t_name} -> Inputs: {t_inputs}x1b[0m")
### tool output
t_output = f(**tool["function"]["arguments"])
print(t_output)
### final res
res = t_output
else:
print('🤬 >', f"x1b[1;31m{t_name} -> NotFoundx1b[0m")
## don't use tool
if agent_res['message']['content'] != '':
res = agent_res["message"]["content"]
t_name, t_inputs = '', ''
return {'res':res, 'tool_used':t_name, 'inputs_used':t_inputs}
As we already did in previous tutorials, the interaction with the Agents can be started with a while loop. The user is requested to provide an image that the first Agent will process.
dic_tools = {'process_image':process_image,
'search_wikipedia':search_wikipedia}
while True:
## user input
try:
q = input('📷 > give me the image to analyze:')
except EOFError:
break
if q == "quit":
break
if q.strip() == "":
continue
messages_1.append( {"role":"user", "content":q} )
plt.imshow(pltimg.imread(q))
plt.show()
## Agent 1
agent_res = ollama.chat(model=llm,
tools=[tool_process_image],
messages=messages_1)
dic_res = use_tool(agent_res, dic_tools)
res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"]
print("👽📷 >", f"x1b[1;30m{res}x1b[0m")
messages_1.append( {"role":"assistant", "content":res} )

The first Agent used the Vision LLM Tool and recognized text within the image. Now, the description will be passed to the second Agent, which shall extract some keywords to search Wikipedia.
## Agent 2
messages_2.append( {"role":"system", "content":"-Picture: "+res} )
agent_res = ollama.chat(model=llm,
tools=[tool_search_wikipedia],
messages=messages_2)
dic_res = use_tool(agent_res, dic_tools)
res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"]
The second Agent used the Tool and extracted information from the web, based on the description provided by the first Agent. Now, it can process everything and give a final answer.
if tool_used == "search_wikipedia":
messages_2.append( {"role":"system", "content":"-Wikipedia: "+res} )
agent_res = ollama.chat(model=llm, tools=[], messages=messages_2)
dic_res = use_tool(agent_res, dic_tools)
res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"]
else:
messages_2.append( {"role":"assistant", "content":res} )
print("👽📖 >", f"x1b[1;30m{res}x1b[0m")
This is literally perfect! Let’s move on to the next example.
Hierarchical Multi-Agent System
Imagine having a squad of Agents that operates with a hierarchical flow, just like a human team, with distinct roles to ensure smooth collaboration and efficient problem-solving. At the top, a manager oversees the overall strategy, talking to the customer (the user), making high-level decisions, and guiding the team toward the goal. Meanwhile, other team members handle operative tasks. Just like humans, Agents can work together and delegate tasks appropriately.
I shall build a tech team of 3 Agents with the objective of querying a SQL database per user’s request. They must work in a hierarchical flow:
- The Lead Agent talks to the user and understands the request. Then, it decides which team member is the most appropriate for the task.
- The Junior Agent has the job of exploring the db and building SQL queries.
- The Senior Agent shall review the SQL code, correct it if necessary, and execute it.
LLMs know how to code by being exposed to a large corpus of both code and natural language text, where they learn patterns, syntax, and semantics of programming languages. The model learns the relationships between different parts of the code by predicting the next token in a sequence. In short, LLMs can generate SQL code but can’t execute it, Agents can.
First of all, I am going to create a database and connect to it, then I shall prepare a series of Tools to execute SQL code.
## Read dataset
import pandas as pd
dtf = pd.read_csv('http://bit.ly/kaggletrain')
dtf.head(3)
## Create dbimport sqlite3
dtf.to_sql(index=False, name="titanic",
con=sqlite3.connect("database.db"),
if_exists="replace")
## Connect db
from langchain_community.utilities.sql_database import SQLDatabase
db = SQLDatabase.from_uri("sqlite:///database.db")
Let’s start with the Junior Agent. LLMs don’t need Tools to generate SQL code, but the Agent doesn’t know the table names and structure. Therefore, we need to provide Tools to investigate the database.
from langchain_community.tools.sql_database.tool import ListSQLDatabaseTool
def get_tables() -> str:
return ListSQLDatabaseTool(db=db).invoke("")
tool_get_tables = {'type':'function', 'function':{
'name': 'get_tables',
'description': 'Returns the name of the tables in the database.',
'parameters': {'type': 'object',
'required': [],
'properties': {}
}}}
## test
get_tables()
That will show the available tables in the db, and this will print the columns in a table.
from langchain_community.tools.sql_database.tool import InfoSQLDatabaseTool
def get_schema(tables: str) -> str:
tool = InfoSQLDatabaseTool(db=db)
return tool.invoke(tables)
tool_get_schema = {'type':'function', 'function':{
'name': 'get_schema',
'description': 'Returns the name of the columns in the table.',
'parameters': {'type': 'object',
'required': ['tables'],
'properties': {'tables': {'type':'str', 'description':'table name. Example Input: table1, table2, table3'}}
}}}
## test
get_schema(tables='titanic')
Since this Agent must use more than one Tool which might fail, I’ll write a solid prompt, following the structure of the previous article.
prompt_junior = '''
[GOAL] You are a data engineer who builds efficient SQL queries to get data from the database.
[RETURN] You must return a final SQL query based on user's instructions.
[WARNINGS] Use your tools only once.
[CONTEXT] In order to generate the perfect SQL query, you need to know the name of the table and the schema.
First ALWAYS use the tool 'get_tables' to find the name of the table.
Then, you MUST use the tool 'get_schema' to get the columns in the table.
Finally, based on the information you got, generate an SQL query to answer user question.
'''
Moving to the Senior Agent. Code checking doesn’t require any particular trick, you can just use the LLM.
def sql_check(sql: str) -> str:
p = f'''Double check if the SQL query is correct: {sql}. You MUST just SQL code without comments'''
res = ollama.generate(model=llm, prompt=p)["response"]
return res.replace('sql','').replace('```','').replace('n',' ').strip()
tool_sql_check = {'type':'function', 'function':{
'name': 'sql_check',
'description': 'Before executing a query, always review the SQL query and correct the code if necessary',
'parameters': {'type': 'object',
'required': ['sql'],
'properties': {'sql': {'type':'str', 'description':'SQL code'}}
}}}
## test
sql_check(sql='SELECT * FROM titanic TOP 3')
Executing code on the database is a different story: LLMs can’t do that alone.
from langchain_community.tools.sql_database.tool import QuerySQLDataBaseTool
def sql_exec(sql: str) -> str:
return QuerySQLDataBaseTool(db=db).invoke(sql)
tool_sql_exec = {'type':'function', 'function':{
'name': 'sql_exec',
'description': 'Execute a SQL query',
'parameters': {'type': 'object',
'required': ['sql'],
'properties': {'sql': {'type':'str', 'description':'SQL code'}}
}}}
## test
sql_exec(sql='SELECT * FROM titanic LIMIT 3')
And of course, a good prompt.
prompt_senior = '''[GOAL] You are a senior data engineer who reviews and execute the SQL queries written by others.
[RETURN] You must return data from the database.
[WARNINGS] Use your tools only once.
[CONTEXT] ALWAYS check the SQL code before executing on the database.First ALWAYS use the tool 'sql_check' to review the query. The output of this tool is the correct SQL query.You MUST use ONLY the correct SQL query when you use the tool 'sql_exec'.'''
Finally, we shall create the Lead Agent. It has the most important job: invoking other Agents and telling them what to do. There are many ways to achieve that, but I find creating a simple Tool the most accurate one.
def invoke_agent(agent:str, instructions:str) -> str:
return agent+" - "+instructions if agent in ['junior','senior'] else f"Agent '{agent}' Not Found"
tool_invoke_agent = {'type':'function', 'function':{
'name': 'invoke_agent',
'description': 'Invoke another Agent to work for you.',
'parameters': {'type': 'object',
'required': ['agent', 'instructions'],
'properties': {
'agent': {'type':'str', 'description':'the Agent name, one of "junior" or "senior".'},
'instructions': {'type':'str', 'description':'detailed instructions for the Agent.'}
}
}}}
## test
invoke_agent(agent="intern", instructions="build a query")
Describe in the prompt what kind of behavior you’re expecting. Try to be as detailed as possible, for hierarchical Multi-Agent Systems can get very confusing.
prompt_lead = '''
[GOAL] You are a tech lead.
You have a team with one junior data engineer called 'junior', and one senior data engineer called 'senior'.
[RETURN] You must return data from the database based on user's requests.
[WARNINGS] You are the only one that talks to the user and gets the requests from the user.
The 'junior' data engineer only builds queries.
The 'senior' data engineer checks the queries and execute them.
[CONTEXT] First ALWAYS ask the users what they want.
Then, you MUST use the tool 'invoke_agent' to pass the instructions to the 'junior' for building the query.
Finally, you MUST use the tool 'invoke_agent' to pass the instructions to the 'senior' for retrieving the data from the database.
'''
I shall keep chat history separate so each Agent will know only a specific part of the whole process.
dic_tools = {'get_tables':get_tables,
'get_schema':get_schema,
'sql_exec':sql_exec,
'sql_check':sql_check,
'Invoke_agent':invoke_agent}
messages_junior = [{"role":"system", "content":prompt_junior}]
messages_senior = [{"role":"system", "content":prompt_senior}]
messages_lead = [{"role":"system", "content":prompt_lead}]
Everything is ready to start the workflow. After the user begins the chat, the first to respond is the Leader, which is the only one that directly interacts with the human.
while True:
## user input
q = input('🙂 >')
if q == "quit":
break
messages_lead.append( {"role":"user", "content":q} )
## Lead Agent
agent_res = ollama.chat(model=llm, messages=messages_lead, tools=[tool_invoke_agent])
dic_res = use_tool(agent_res, dic_tools)
res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"]
agent_invoked = res.split("-")[0].strip() if len(res.split("-")) > 1 else ''
instructions = res.split("-")[1].strip() if len(res.split("-")) > 1 else ''
###-->CODE TO INVOKE OTHER AGENTS HERE<--###
## Lead Agent final response print("👩💼 >", f"x1b[1;30m{res}x1b[0m") messages_lead.append( {"role":"assistant", "content":res} )
The Lead Agent decided to invoke the Junior Agent giving it some instruction, based on the interaction with the user. Now the Junior Agent shall start working on the query.
## Invoke Junior Agent
if agent_invoked == "junior":
print("😎 >", f"x1b[1;32mReceived instructions: {instructions}x1b[0m")
messages_junior.append( {"role":"user", "content":instructions} )
### use the tools
available_tools = {"get_tables":tool_get_tables, "get_schema":tool_get_schema}
context = ''
while available_tools:
agent_res = ollama.chat(model=llm, messages=messages_junior,
tools=[v for v in available_tools.values()])
dic_res = use_tool(agent_res, dic_tools)
res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"]
if tool_used:
available_tools.pop(tool_used)
context = context + f"nTool used: {tool_used}. Output: {res}" #->add tool usage context
messages_junior.append( {"role":"user", "content":context} )
### response
agent_res = ollama.chat(model=llm, messages=messages_junior)
dic_res = use_tool(agent_res, dic_tools)
res = dic_res["res"]
print("😎 >", f"x1b[1;32m{res}x1b[0m")
messages_junior.append( {"role":"assistant", "content":res} )
The Junior Agent activated all its Tools to explore the database and collected the necessary information to generate some SQL code. Now, it must report back to the Lead.
## update Lead Agent
context = "Junior already wrote this query: "+res+ "nNow invoke the Senior to review and execute the code."
print("👩💼 >", f"x1b[1;30m{context}x1b[0m")
messages_lead.append( {"role":"user", "content":context} )
agent_res = ollama.chat(model=llm, messages=messages_lead, tools=[tool_invoke_agent])
dic_res = use_tool(agent_res, dic_tools)
res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"]
agent_invoked = res.split("-")[0].strip() if len(res.split("-")) > 1 else ''
instructions = res.split("-")[1].strip() if len(res.split("-")) > 1 else ''
The Lead Agent received the output from the Junior and asked the Senior Agent to review and execute the SQL query.
## Invoke Senior Agent
if agent_invoked == "senior":
print("🧓 >", f"x1b[1;34mReceived instructions: {instructions}x1b[0m")
messages_senior.append( {"role":"user", "content":instructions} )
### use the tools
available_tools = {"sql_check":tool_sql_check, "sql_exec":tool_sql_exec}
context = ''
while available_tools:
agent_res = ollama.chat(model=llm, messages=messages_senior,
tools=[v for v in available_tools.values()])
dic_res = use_tool(agent_res, dic_tools)
res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"]
if tool_used:
available_tools.pop(tool_used)
context = context + f"nTool used: {tool_used}. Output: {res}" #->add tool usage context
messages_senior.append( {"role":"user", "content":context} )
### response
print("🧓 >", f"x1b[1;34m{res}x1b[0m")
messages_senior.append( {"role":"assistant", "content":res} )
The Senior Agent executed the query on the db and got an answer. Finally, it can report back to the Lead which will give the final answer to the user.
### update Lead Agent
context = "Senior agent returned this output: "+res
print("👩💼 >", f"x1b[1;30m{context}x1b[0m")
messages_lead.append( {"role":"user", "content":context} )
Conclusion
This article has covered the basic steps of creating Multi-Agent Systems from scratch using only Ollama. With these building blocks in place, you are already equipped to start developing your own MAS for different use cases.
Stay tuned for Part 4, where we will dive deeper into more advanced examples.
Full code for this article: GitHub
I hope you enjoyed it! Feel free to contact me for questions and feedback or just to share your interesting projects.
👉 Let’s Connect 👈
All images, unless otherwise noted, are by the author