For uploading Kaggle data into Google Colab, the needed credentials such as Kaggle API can be downloaded from Kaggle profile. This json file should be uploaded to the files section in Google Colab. After uploading the file, the following snippet will be use for getting the data.
# Setting the Kaggle API configuration directory to where 'kaggle.json' is
os.environ['KAGGLE_CONFIG_DIR'] = '/content'# Changing the file permission of 'kaggle.json' to be readable by the owner
!chmod 600 /content/kaggle.json
# Downloading the dataset from Kaggle using the dataset identifier
!kaggle datasets download -d paramaggarwal/fashion-product-images-small
# Unzipping the downloaded dataset archive into the folder named 'fashion_data'
!unzip -q fashion-product-images-small.zip -d ./fashion_data
The csv file containing data about each item does not include prices. Therefore, a new column is created with random prices between $20 and $100 for each item.
#Reading the CSV file
csv_file_path='fashion_data/styles.csv'
df=pd.read_csv(csv_file_path,nrows=200)#Generating Random prices between 20 and 101.
df['Price'] = np.random.randint(20, 101, size=len(df))
#Display the first 5 rows
df.head()
The first step in the image embedding process is to define a function that takes an image path as input and returns a 1408-dimensional vector representing the image’s embedding. This function calls a multimodal embedding model from Vertex AI, which takes the image and the desired embedding vector length as inputs.
def get_image_embedding(image_path: str,dimension: int | None = 1408,) -> list[float]:#Load the Image
image = VMImage.load_from_file(image_path)
#Get the Embedding
embedding = mm_embedding_model.get_embeddings(
image=image,
dimension=dimension,
)
return embedding.image_embedding
After defining the function, the next step is to call the Multimodal Embedding model, define a list to store all the embeddings, and retrieve the IDs of all available products. Then, we loop over all the image IDs, generate an embedding for each image, and store each embedding in the list.
# Calling the Multimodel Embedding Model
mm_embedding_model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding@001")#Define a list that will take all the embedding
store_embedding_list=[]
#Get the Ids of all the photos in the dataframe
photo_id=list(df['id'])
#Loop over all the image ids and create the embedding through the get_image_embedding
for name in photo_id:
image_path='fashion_data/images/'+str(name)+'.jpg'
image_emb = get_image_embedding(image_path=image_path)
store_embedding_list.append(image_emb)
The retrieved embeddings will be stored in a Chroma vector space along with their corresponding IDs. This vector space will later be used to retrieve the most similar products to an image uploaded by the user.
#Define a name for the Vector Space
DB_NAME = "Fashion_Store_Embedding"#Initiate a Client
chroma_client = chromadb.Client()
#Specify the Database Name
collection = chroma_client.create_collection(name=DB_NAME )
#Add Data To the Collection
collection.add(embeddings= store_embedding_list,ids=[str(i) for i in photo_id])
The first tool which will be called add_row that the Agent will use is to save the Name of the Client, the Product the client wants to buy, as well as the price of this product as a new row in a previously specified data frame, which will be called records_df.
#Define the dataframe that the agent will use to store interactions between the clients
records_df= pd.DataFrame(columns=['Name of the Client', 'Product','Price'])#Define the function
def add_row(client_name: str, product: str, price: float) :
"""Takes the client name, product name and the price of the product and saves the info as a row in a dataframe"""
global records_df
row_append=pd.DataFrame('Name of the Client': [client_name],'Product':[product],'Price':[price])
records_df=pd.concat([records_df,row_append])
The second tool which will be called similar_products takes as input the Image Path along with the number of similar photos that need to be fetched and then returns a dictionary with the name of the product along with their prices sorted in decreasing order of their similarity to the inputted photo. So, the first key in the dictionary will be the most similar product to the inputted photo, while the last key will be the farthest image to the inputted photo.
def similar_products(image_query_path: str, number_products:int) -> dict :""" Takes image path as an input and the number of similar results needed and return a dictionary that contains the name of the similar products and their price in US Dollars"""
#Get the Embedding Vector for the inputted photo
image_emb = get_image_embedding(image_path=image_query_path)
#Get the top 5 similar images to the inputted one
results = collection.query(query_embeddings=image_emb,n_results=number_products)
#Get the ids of the images retreived
image_ids=results['ids'][0]
image_ids=list(map(int, image_ids))
#Get the rows of the images retreived
filtered_df = df[df['id'].isin(image_ids)]
filtered_df=filtered_df[['productDisplayName','Price','id']].reset_index()
del filtered_df['index']
#Rename the columns
filtered_df.columns=['Product Name','Price in US Dollars','id']
#Sort the dataframe in the order of similarity
filtered_df['id'] = pd.Categorical(filtered_df['id'], categories=image_ids, ordered=True)
filtered_df = filtered_df.sort_values('id')[['Product Name','Price in US Dollars', 'id']].reset_index(drop=True)
#Transform the dataframe to a dictionary
dict_price=dict(zip(filtered_df['Product Name'], filtered_df['Price in US Dollars']))
return dict_price
The instruction is a prompt that will be sent to the Model, this prompt explains all the necessary logic for the Agent and when to use each tool.
#The Prompt Useinstruction = """You are a helpful chatbot that works in a retail store, you're job is to help the customer with identifying the name of the most similar
product to the image provided within the budget they have. The customer will attach the image and specify that amount he/she is willing to pay and then you'll use the name_of_similar_product tool to get as a dictionary the name and the price of
the top 5 similar products to the image attached sorted in the decreasing orer of the similarity, so the first key will be the most similar item while the last key is the farthest item.
You will look first at the most similar product and if the price of this product is higher than the budget of the client you'll go to the second one and so until
you reach the fifth product, if the prices of top 5 similar products are higher than the budget then tell the client that no products are available with the price provided. Note that sometimes,
the customer will provide the budget with different currencies so make sure to transform to US dollar and show the conversion in the answer.
if the client confirmed that he/she wants to buy the item proposed then save the client name, the product name and the price using add_row"""
After defining the tools and writing the instruction for the model to follow, now we will combine all this setup through creating a new custom chat session.
#Saving all the tools that will be used the Agent in a list
all_tools=[similar_products,add_row]# Create a new chat session using the Gemini 2.0 Flash model
#with a custom system instruction and a set of predefined tools
chat = client.chats.create(
model="gemini-2.0-flash",
config=types.GenerateContentConfig(
system_instruction=instruction,
tools=all_tools,
),
)
Now, everything is set! For experimentation, we will randomly select an image and send it to the agent along with a specified budget in GBP (or any other currency of your choice), then observe the agent’s response.
#Choose a random photo from the list
name=random.choice(photo_id)
image_path='fashion_data/images/'+str(name)+'.jpg'#Display the Image
image = PImage.open(image_path)
display(image) # Renders the image in the notebook
#We pass to the agent, the image as well as the Budget
message_prompt=[image_path, "My name is Alex and my budget is 100 GBP "]
#Get the response
resp = chat.send_message(message_prompt)
resp.text
For example, if the uploaded image is looks like the following image:
The response of the agent will be:
I have the 5 most similar products to the image you provided.
Your budget is 100 GBP which is approximately 120 USD.
The most similar product is "Puma Men\'s Stripe Polo Black T-shirt"
and it costs $99. Would you like to buy it?'
If the user replied back as
resp = chat.send_message(["Yes I want to buy this item"])
Then the agent will reply as
Great, I have saved your order. You have bought “Puma Men\’s Stripe Polo Black T-shirt” for $99.
If we look at the data frame defined at the Beginning, we will notice that a new row was added.
Many enhancements can be made to this agent. Some suggestions for further development include:
- Connect the orders to a live database.
- Enable voice messaging.
- Allow web search so the agent can look for other products if none are available within the budget.
- Try photos with Higher Resolution and try a vast amount of photos