Image by Author
Â
Llama 3.2-Vision is a powerful multimodal model capable of processing both text and image data. With parameter sizes of 11B and 90B, it is specifically designed for tasks like object recognition, image captioning, and scene interpretation.
In this tutorial, we will explore the easiest way to use Llama 3.2-Vision locally on a GPU without requiring an internet connection. We will use the Msty desktop application to download, manage, and interact with the model both via its user interface and API.
Â
Accessing Llama 3.2-Vision with the Msty Desktop App
Â
Step 1: Download and Install the Msty Application
- Visit the official website to download the latest version of the Msty desktop app.
- Install the application following the installation wizard.
Â
Step 2: Download the Llama 3.2-Vision Model
- Open the Msty app and navigate to the Local AI Models menu. To access this menu, click the gear icon in the bottom-left corner > Select Local AI > Click on Manage Local AI Models.
- Download the Llama 3.2-Vision model from this menu.
- Verify that the model is compatible with your machine (GPU and system requirements are displayed in the app).
The model is fully compatible with our machine, so we won’t have any issues running this model.
Â
![Using Llama 3.2-Vision Locally: A Step-by-Step Guide](https://www.kdnuggets.com/wp-content/uploads/awan_llama_32vision_locally_stepbystep_guide_2.png)
![Using Llama 3.2-Vision Locally: A Step-by-Step Guide](https://www.kdnuggets.com/wp-content/uploads/awan_llama_32vision_locally_stepbystep_guide_2.png)
Â
Step 3: Select the Llama 3.2-Vision Model
- Once the download is complete, go to the Chat menu.
- By default, the Llama 3.2-Vision model will be selected. If not, you can manually select it from the dropdown menu.
Â
![Using Llama 3.2-Vision Locally: A Step-by-Step Guide](https://www.kdnuggets.com/wp-content/uploads/awan_llama_32vision_locally_stepbystep_guide_3.png)
![Using Llama 3.2-Vision Locally: A Step-by-Step Guide](https://www.kdnuggets.com/wp-content/uploads/awan_llama_32vision_locally_stepbystep_guide_3.png)
Â
Step 4: Load an Image
- Use the paperclip icon in the chat panel to upload an image.
- Once the image is uploaded, you can interact with the model by asking questions like “What is in this picture?” or requesting a detailed description.
Â
![Using Llama 3.2-Vision Locally: A Step-by-Step Guide](https://www.kdnuggets.com/wp-content/uploads/awan_llama_32vision_locally_stepbystep_guide_6.png)
![Using Llama 3.2-Vision Locally: A Step-by-Step Guide](https://www.kdnuggets.com/wp-content/uploads/awan_llama_32vision_locally_stepbystep_guide_6.png)
Â
Â
![Using Llama 3.2-Vision Locally: A Step-by-Step Guide](https://www.kdnuggets.com/wp-content/uploads/awan_llama_32vision_locally_stepbystep_guide_4.png)
![Using Llama 3.2-Vision Locally: A Step-by-Step Guide](https://www.kdnuggets.com/wp-content/uploads/awan_llama_32vision_locally_stepbystep_guide_4.png)
Â
Accessing the Vision Model via the Msty API
Â
Msty also provides an API to interact with the Llama 3.2-Vision model programmatically. Here’s how:
Â
Step 1: Enable the API Endpoint
- Go to the Settings menu in the Msty app.
- Enable the Local AI Endpoint Service in the Local AI section. This will display the local API URL (e.g., http://localhost:10000).
Â
![Using Llama 3.2-Vision Locally: A Step-by-Step Guide](https://www.kdnuggets.com/wp-content/uploads/awan_llama_32vision_locally_stepbystep_guide_5.png)
![Using Llama 3.2-Vision Locally: A Step-by-Step Guide](https://www.kdnuggets.com/wp-content/uploads/awan_llama_32vision_locally_stepbystep_guide_5.png)
Â
Step 2: Use the API with Python
You can use the requests library to interact with the API. Below is a Python script that encodes an image into a Base64 string and sends it to an API endpoint using the requests library.
import requests
import base64
# Function to encode image to base64
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
# Path to your image
image_path = "burn_out_image.jpg"
# Get base64 encoded image
base64_image = encode_image(image_path)
# API endpoint
url = "http://localhost:10000/api/generate"
# Payload
payload = {
"model": "llama3.2-vision",
"prompt": "What is in this picture?",
"stream": False,
"images": [base64_image],
}
# Make the POST request
response = requests.post(url, json=payload)
# Parse and display the "response" field
response_json = response.json()
print(response_json.get("response", "No response found"))
Â
The response is concise and accurate. You can increase the response length by changing the max token setting appropriately.
This image shows a list of symptoms that can be indicative of burnout. Some of the symptoms include insomnia, fatigue, irritability and anxiety.
Â
Final Thoughts
Â
The key benefits of using Llama 3.2-Vision locally include:
- Ease of use: The Msty desktop app simplifies the process of downloading, managing, and running complex AI models.
- Offline functionality: No internet is required to run the model, ensuring privacy and faster processing.
- Integration: The API allows seamless integration into custom applications or workflows.
When Llama 3.2-Vision was first launched, using such a large and complex multimodal model locally was challenging. Thanks to tools like the Msty app, it has become significantly easier to download and use these models with just a few clicks. Moreover, the ability to integrate the model into applications or edge devices unlocks its potential for real-world use cases.
As AI technology continues to advance, even the most sophisticated models are becoming more accessible, further fueling the AI revolution.
Â
Â
Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master’s degree in technology management and a bachelor’s degree in telecommunication engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.