Visual Planning and Reasoning in LLMs: Vector Graphics and Structured Representations | by Erik Taylor | Mar, 2025

Prompt 1: Please help me with this task. Given an input prompt, generate the corresponding image with SVG code. Here is the prompt:

The image features a cat with a long, fluffy coat, painted in a watercolor style. The cat is looking directly at the viewer, with its eyes wide open. The cat’s fur is depicted in various shades of orange, and its whiskers are also visible. The painting captures the cat’s unique features and expression, making it an eye-catching and beautiful piece of art.

* Use a simplified image representation.

* Provide the code within a single <svg> tag.

Here’s an example format: <svg width=’…’ height=’…’>…</svg>”

Prompt 2: Edit the following: First, make the cat blue. Next, add a text label that says, “This is a blue cat”.

Result: Pass

Discussion: Each LLMs successfully produced and iteratively modified SVG code, demonstrating significant advantages in precise control over image attributes, such as color, shape, and text elements — capabilities lacking in diffusion models.

Task 3: Multi-modal Planning with Vision-Language Models (VLMs)

Task description: VLMs enhance LLM capabilities by integrating visual inputs. Recent research (e.g., VGBench by Zou et al.) evaluated structured graphic generation by VLMs using formats like Graphviz, TikZ, and SVG, finding strong zero-shot and visual-language performance. Despite theoretical advantages, practical performance in complex image tasks varies, prompting evaluation with GPT-4o.

Difficulty level for LLMs: Hard

Example: Image-to-text-to-SVG conversion using GPT-4o vision capabilities.

Input image (created using Imagen 3). A portrait of an orange cat. 8-bit color scheme. Simplistic design. Light background.

Next, I input this image into GPT-4o and convert to text with vision capabilities.

Text generation prompt: Your goal will be to copy the image provided into text. Provide a detailed and artistic description of the image.

Image generation prompt: Provided the above input prompt, generate the corresponding image with SVG code.

* Use a simplified image representation.

* Provide the code within a single <svg> tag.

Here’s an example format: <svg width=’…’ height=’…’>…</svg>

Text output: The image is a pixel art depiction of a ginger cat with large, expressive green eyes. The cat’s fur is rendered in warm shades of orange, with distinct tabby markings running along its forehead and ears. The soft texture of its fur is represented through careful shading and pixel placement, creating depth and dimension.

The cat’s eyes are round and bright, featuring a captivating green hue with subtle highlights that give them a sense of life and curiosity. Its pink nose and small, slightly parted mouth contribute to a gentle and inquisitive expression. White whiskers extend gracefully from either side of its face, contrasting against the warm tones of its fur.

The background is a simple, soft beige-to-light-pink gradient, allowing the cat to stand out as the central focus of the composition. The overall pixel art style maintains a balance between detail and simplicity, embracing a retro aesthetic while still conveying a lifelike and charming presence.

The image evokes a sense of warmth and friendliness, making it a visually pleasing piece that captures the essence of a ginger cat in a beautifully stylized format.

SVG image result:

**Image-to-text-to-SVG conversion using GPT-4o vision capabilities.**

Result: Pass

Discussion: The image-text-SVG pipeline showed GPT-4o’s capability to generate structured SVG visuals from textual descriptions. Although simplified, the results highlight successful visual reasoning via sequential conversion, copying the eye color, fur color, and background.

Task 4: LLM Code Generation for Physics Simulation

Task description: LLMs efficiently generate code for physics simulations, visualizing complex scientific phenomena interactively.

Difficulty level for LLMs: Easy

Example: 3D diffusion simulation using o3-mini.

Prompt: Write a Python function simulating 3D diffusion from a point source. Starting from the origin, simulate 10 particles undergoing a random walk process. Select a color palette that demonstrates each particle’s path and allow partial transparency.

Result: Pass

Discussion: This example showcases the ability of LLMs to translate a descriptive prompt into executable code. This highlights the potential of LLMs for enhancing conceptual understanding and visual representation in physics.

Task 5: LLM Code Generation for Dynamic Simulation

Task description: Beyond generating code for static diagrams or visualizations of single time-steps, LLMs can also produce code for dynamic simulations. This allows for the creation of interactive visualizations that evolve over time, providing a more intuitive and engaging way to explore dynamic systems. These simulations can incorporate various physical laws and forces, such as gravity and friction, enabling the modeling of complex interactions.

Difficulty level for LLMs: Hard

Example: Dynamic physics simulation with Pygame using o3-mini.

Prompt: Write a Python program that simulates a ball bouncing inside a spinning hexagon, incorporating gravity and friction.

Result: Pass

Discussion: This example demonstrated LLM proficiency in generating interactive simulations, successfully visualizing a dynamic scenario involving gravity, friction, and a spinning hexagon, enhancing intuitive physical understanding.

Task 6: LLM Generation of Diagrams using TikZ

Task description: Evaluating LLM capabilities for generating complex diagrams using TikZ.

Difficulty level for LLMs: Hard

Example: Physics diagram of a spinning top with GPT-4o, o3-mini-high, o1, and GPT 4.5.

Visual Planning and Reasoning in LLMs: Vector Graphics and Structured Representations | by Erik Taylor | Mar, 2025

Recent Articles

Qwen2.5-Omni is a Powerhouse: A Guide with Demo Project

Set up a custom plugin on Amazon Q Business and authenticate with Amazon Cognito to interact with backend systems

Learn a Smarter Way to Defend Modern Applications

Disneyland’s 70th Anniversary Brings Cartoony Chaos to This Summer’s Celebration

Scraping Reinvented: AI + Groq + Crawl4AI for High-Speed Data Mining | by Gauravpatil | May, 2025

Related Stories

Leave A Reply Cancel reply