Fine-Tuning Stable Diffusion with LoRA

Stable Diffusion can generate an image based on your input. There are many models that are similar in architecture and pipeline, but their output can be quite different. There are many ways to adjust their behavior, such as when you give a prompt, the output will be in a certain style by default. LoRA is one technique that does not require you to recreate a large model. In this post, you will see how you can create a LoRA on your own.

After finishing this post, you will learn

How to prepare and train a LoRA model
How to use the trained LoRA in Python

Let’s get started.

Fine-tuning Stable Diffusion with LoRA
Photo by Thimo Pedersen. Some rights reserved.

Overview

This post is in three parts; they are

Preparation for Training a LoRA
Training a LoRA with Diffusers Library
Using Your Trained LoRA

Preparation for Training a LoRA

We covered the idea of using LoRA in the Web UI in a previous post. If you want to create your own LoRA, a plugin in the Web UI allows you to do that, or you can create one using your own program. Since all training will be computationally intensive, be sure you have a machine with GPU to continue.

We will use the training script from the example directory of the diffusers library. Before you start, you have to set up the environment by installing the required Python libraries, using the following commands:

pip install git+https://github.com/huggingface/diffusers pip install accelerate wand pip install -r https://raw.githubusercontent.com/huggingface/diffusers/main/examples/text_to_image/requirements.txt accelerate config default # accelerate configuration saved at /home/adrian/.cache/huggingface/accelerate/default_config.yaml

pip install git+https://github.com/huggingface/diffusers

pip install accelerate wand

pip install –r https://raw.githubusercontent.com/huggingface/diffusers/main/examples/text_to_image/requirements.txt

accelerate config default

# accelerate configuration saved at /home/adrian/.cache/huggingface/accelerate/default_config.yaml

The first command is to install the diffusers library from GitHub, which will be the development version. This is required because you will use the training script from GitHub, hence you should use the matching version.

The last command above confirmed you have installed the accelerate library and detect what GPU you have on your computer. You have downloaded and installed many libraries. You can try to run the Python statements below to confirm that all are installed correctly and that you have no import error:

import wandb import torch from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler, AutoPipelineForText2Image from huggingface_hub import model_info

import wandb

import torch

from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler, AutoPipelineForText2Image

from huggingface_hub import model_info

You will use the LoRA training script from the examples of diffusers. Let’s download the script first:

wget -q https://raw.githubusercontent.com/huggingface/diffusers/main/examples/text_to_image/train_text_to_image_lora.py

wget –q https://raw.githubusercontent.com/huggingface/diffusers/main/examples/text_to_image/train_text_to_image_lora.py

Training a LoRA with Diffusers Library

For fine-tuning, you will be using the Pokémon BLIP captions with English and Chinese dataset on the base model runwayml/stable-diffusion-v1-5 (the official Stable Diffusion v1.5 model). You can adjust hyperparameters to suit your specific use case, but you can start with the following Linux shell commands.

export MODEL_NAME=”runwayml/stable-diffusion-v1-5″ export OUTPUT_DIR=”./finetune_lora/pokemon” export HUB_MODEL_ID=”pokemon-lora” export DATASET_NAME=”svjack/pokemon-blip-captions-en-zh” mkdir -p $OUTPUT_DIR accelerate launch –mixed_precision=”bf16″ train_text_to_image_lora.py \ –pretrained_model_name_or_path=$MODEL_NAME \ –dataset_name=$DATASET_NAME \ –dataloader_num_workers=8 \ –resolution=512 \ –center_crop \ –random_flip \ –train_batch_size=1 \ –gradient_accumulation_steps=4 \ –max_train_steps=15000 \ –learning_rate=1e-04 \ –max_grad_norm=1 \ –lr_scheduler=”cosine” \ –lr_warmup_steps=0 \ –output_dir=${OUTPUT_DIR} \ –checkpointing_steps=500 \ –caption_column=”en_text” \ –validation_prompt=”A pokemon with blue eyes.” \ –seed=1337

export MODEL_NAME=“runwayml/stable-diffusion-v1-5”

export OUTPUT_DIR=“./finetune_lora/pokemon”

export HUB_MODEL_ID=“pokemon-lora”

export DATASET_NAME=“svjack/pokemon-blip-captions-en-zh”

mkdir –p $OUTPUT_DIR

accelerate launch —mixed_precision=“bf16” train_text_to_image_lora.py \

—pretrained_model_name_or_path=$MODEL_NAME \

—dataset_name=$DATASET_NAME \

—dataloader_num_workers=8 \

—resolution=512 \

—center_crop \

—random_flip \

—train_batch_size=1 \

—gradient_accumulation_steps=4 \

—max_train_steps=15000 \

—learning_rate=1e–04 \

—max_grad_norm=1 \

—lr_scheduler=“cosine” \

—lr_warmup_steps=0 \

—output_dir=${OUTPUT_DIR} \

—checkpointing_steps=500 \

—caption_column=“en_text” \

—validation_prompt=“A pokemon with blue eyes.” \

—seed=1337

Running this command will take hours to complete, even with a high-end GPU. But let’s look closer at what this does.

The accelerate command helps you to launch the training across multiple GPUs. It does no harm if you have just one. Many modern GPUs support the “Brain Float 16” floating point introduced by the Google Brain project. If it is supported, the option --mixed_precision="bf16" will save memory and run faster.

The command script downloads the dataset from the Hugging Face Hub and uses it to train a LoRA model. The batch size, training steps, learning rate, and so on are the hyperparameters for the training. The trained model will be checkpointed once every 500 steps to the output directory.

Training a LoRA requires a dataset with images (pixels) and corresponding captions (text). The caption text describes the image, and the trained LoRA will understand that these captions should mean those images. If you check out the dataset on Hugging Face Hub, you will see the caption name was en_text, and that is set to --caption_column above.

If you are providing your own dataset instead (e.g., manually create captions for the images you gathered), you should create a CSV file metadata.csv with first column named file_name and second column to be your text captions, like the following:

file_name,caption image_0.png,a drawing of a green pokemon with red eyes image_1.png,a green and yellow toy with a red nose image_2.png,a red and white ball with an angry look on its face …

file_name,caption

image_0.png,a drawing of a green pokemon with red eyes

image_1.png,a green and yellow toy with a red nose

image_2.png,a red and white ball with an angry look on its face

…

and keep this CSV together with all your images (matching the file_name column) in the same directory, and use the directory name as your dataset name.

There will be many subdirectories and files created under the directory as assigned to OUTPUT_DIR in the script above. Each checkpoint will contain the full Stable Diffusion model weight, and extracted LoRA safetensors. Once you finish the training, you can delete all of them except the final LoRA file, pytorch_lora_weights.safetensors.

Using Your Trained LoRA

Running a Stable Diffusion pipeline with LoRA just require a small modification to your Python code. An example would be the following:

from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler from huggingface_hub import model_info import torch # LoRA weights ~3 MB model_path = “pcuenq/pokemon-lora” info = model_info(model_path) model_base = info.cardData[“base_model”] pipe = StableDiffusionPipeline.from_pretrained(model_base, torch_dtype=torch.float16) pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) pipe.unet.load_attn_procs(model_path) pipe.to(“cuda”) image = pipe(“Green pokemon with menacing face”, num_inference_steps=25).images[0] image.save(“green_pokemon.png”)

from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler

from huggingface_hub import model_info

import torch

# LoRA weights ~3 MB

model_path = “pcuenq/pokemon-lora”

info = model_info(model_path)

model_base = info.cardData[“base_model”]

pipe = StableDiffusionPipeline.from_pretrained(model_base, torch_dtype=torch.float16)

pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

pipe.unet.load_attn_procs(model_path)

pipe.to(“cuda”)

image = pipe(“Green pokemon with menacing face”, num_inference_steps=25).images[0]

image.save(“green_pokemon.png”)

The code above downloaded a LoRA from the Hugging Face Hub repository pcuenq/pokemon-lora and attach it to the pipeline using the line pipe.unet.load_attn_procs(model_path). The rest is just as usual. The image generated may look like the following:

Green pokemon as generated

This is the more verbose way of using the LoRA since you have to know that this particular LoRA should be loaded to the attention process of the pipeline’s unet part. Such details should be found in the model card in the repository.

An easier way of using the LoRA would be to use the auto pipeline, from which the model architecture is inferred from the model file:

from diffusers import AutoPipelineForText2Image import torch pipeline = AutoPipelineForText2Image.from_pretrained(“runwayml/stable-diffusion-v1-5”, torch_dtype=torch.float16 ).to(“cuda”) pipeline.load_lora_weights(“finetune_lora/pokemon”, weight_name=”pytorch_lora_weights.safetensors”) image = pipeline(“A pokemon with blue eyes”).images[0]

from diffusers import AutoPipelineForText2Image

import torch

pipeline = AutoPipelineForText2Image.from_pretrained(“runwayml/stable-diffusion-v1-5”,

torch_dtype=torch.float16

).to(“cuda”)

pipeline.load_lora_weights(“finetune_lora/pokemon”,

weight_name=“pytorch_lora_weights.safetensors”)

image = pipeline(“A pokemon with blue eyes”).images[0]

The parameters to load_lora_weights() is the directory name and the file name to your trained LoRA file. This works for other LoRA files, such as those you downloaded from Civitai.

Summary

In this post, you saw how to create your own LoRA model, given a set of images and the description text. This is a time-consuming process, but the result is that you have a small weight file that can modify the behavior of the diffusion model. You learned how to run the training of LoRA using diffusers library. You also saw how to use a LoRA weight in your Stable Diffusion pipeline code.

Fine-Tuning Stable Diffusion with LoRA

Overview

Preparation for Training a LoRA

Training a LoRA with Diffusers Library

Using Your Trained LoRA

Further Reading

Summary

Recent Articles

Building Robust ViewModels | Kodeco

Auto-Completion Style Text Generation with GPT-2 Model

Write for Towards Data Science

Silver Fox APT Uses Winos 4.0 Malware in Cyber Attacks Against Taiwanese Organizations

Zendaya Is No Longer Meechee, But She Is Shrek

Related Stories

Leave A Reply Cancel reply