Stable Diffusion can generate an image based on your input. There are many models that are similar in architecture and pipeline, but their output can be quite different. There are many ways to adjust their behavior, such as when you give a prompt, the output will be in a certain style by default. LoRA is one technique that does not require you to recreate a large model. In this post, you will see how you can create a LoRA on your own.
After finishing this post, you will learn
- How to prepare and train a LoRA model
- How to use the trained LoRA in Python
Let’s get started.
Overview
This post is in three parts; they are
- Preparation for Training a LoRA
- Training a LoRA with Diffusers Library
- Using Your Trained LoRA
Preparation for Training a LoRA
We covered the idea of using LoRA in the Web UI in a previous post. If you want to create your own LoRA, a plugin in the Web UI allows you to do that, or you can create one using your own program. Since all training will be computationally intensive, be sure you have a machine with GPU to continue.
We will use the training script from the example directory of the diffusers library. Before you start, you have to set up the environment by installing the required Python libraries, using the following commands:
pip install git+https://github.com/huggingface/diffusers pip install accelerate wand pip install –r https://raw.githubusercontent.com/huggingface/diffusers/main/examples/text_to_image/requirements.txt
accelerate config default # accelerate configuration saved at /home/adrian/.cache/huggingface/accelerate/default_config.yaml |
The first command is to install the diffusers
library from GitHub, which will be the development version. This is required because you will use the training script from GitHub, hence you should use the matching version.
The last command above confirmed you have installed the accelerate
library and detect what GPU you have on your computer. You have downloaded and installed many libraries. You can try to run the Python statements below to confirm that all are installed correctly and that you have no import error:
import wandb import torch from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler, AutoPipelineForText2Image from huggingface_hub import model_info |
You will use the LoRA training script from the examples of diffusers. Let’s download the script first:
wget –q https://raw.githubusercontent.com/huggingface/diffusers/main/examples/text_to_image/train_text_to_image_lora.py |
Training a LoRA with Diffusers Library
For fine-tuning, you will be using the Pokémon BLIP captions with English and Chinese dataset on the base model runwayml/stable-diffusion-v1-5
(the official Stable Diffusion v1.5 model). You can adjust hyperparameters to suit your specific use case, but you can start with the following Linux shell commands.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
export MODEL_NAME=“runwayml/stable-diffusion-v1-5” export OUTPUT_DIR=“./finetune_lora/pokemon” export HUB_MODEL_ID=“pokemon-lora” export DATASET_NAME=“svjack/pokemon-blip-captions-en-zh”
mkdir –p $OUTPUT_DIR
accelerate launch —mixed_precision=“bf16” train_text_to_image_lora.py \ —pretrained_model_name_or_path=$MODEL_NAME \ —dataset_name=$DATASET_NAME \ —dataloader_num_workers=8 \ —resolution=512 \ —center_crop \ —random_flip \ —train_batch_size=1 \ —gradient_accumulation_steps=4 \ —max_train_steps=15000 \ —learning_rate=1e–04 \ —max_grad_norm=1 \ —lr_scheduler=“cosine” \ —lr_warmup_steps=0 \ —output_dir=${OUTPUT_DIR} \ —checkpointing_steps=500 \ —caption_column=“en_text” \ —validation_prompt=“A pokemon with blue eyes.” \ —seed=1337 |
Running this command will take hours to complete, even with a high-end GPU. But let’s look closer at what this does.
The accelerate command helps you to launch the training across multiple GPUs. It does no harm if you have just one. Many modern GPUs support the “Brain Float 16” floating point introduced by the Google Brain project. If it is supported, the option --mixed_precision="bf16"
will save memory and run faster.
The command script downloads the dataset from the Hugging Face Hub and uses it to train a LoRA model. The batch size, training steps, learning rate, and so on are the hyperparameters for the training. The trained model will be checkpointed once every 500 steps to the output directory.
Training a LoRA requires a dataset with images (pixels) and corresponding captions (text). The caption text describes the image, and the trained LoRA will understand that these captions should mean those images. If you check out the dataset on Hugging Face Hub, you will see the caption name was en_text
, and that is set to --caption_column
above.
If you are providing your own dataset instead (e.g., manually create captions for the images you gathered), you should create a CSV file metadata.csv
with first column named file_name
and second column to be your text captions, like the following:
file_name,caption image_0.png,a drawing of a green pokemon with red eyes image_1.png,a green and yellow toy with a red nose image_2.png,a red and white ball with an angry look on its face … |
and keep this CSV together with all your images (matching the file_name
column) in the same directory, and use the directory name as your dataset name.
There will be many subdirectories and files created under the directory as assigned to OUTPUT_DIR
in the script above. Each checkpoint will contain the full Stable Diffusion model weight, and extracted LoRA safetensors. Once you finish the training, you can delete all of them except the final LoRA file, pytorch_lora_weights.safetensors
.
Using Your Trained LoRA
Running a Stable Diffusion pipeline with LoRA just require a small modification to your Python code. An example would be the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler from huggingface_hub import model_info import torch
# LoRA weights ~3 MB model_path = “pcuenq/pokemon-lora”
info = model_info(model_path) model_base = info.cardData[“base_model”] pipe = StableDiffusionPipeline.from_pretrained(model_base, torch_dtype=torch.float16) pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.unet.load_attn_procs(model_path) pipe.to(“cuda”)
image = pipe(“Green pokemon with menacing face”, num_inference_steps=25).images[0] image.save(“green_pokemon.png”) |
The code above downloaded a LoRA from the Hugging Face Hub repository pcuenq/pokemon-lora
and attach it to the pipeline using the line pipe.unet.load_attn_procs(model_path)
. The rest is just as usual. The image generated may look like the following:
This is the more verbose way of using the LoRA since you have to know that this particular LoRA should be loaded to the attention process of the pipeline’s unet
part. Such details should be found in the model card in the repository.
An easier way of using the LoRA would be to use the auto pipeline, from which the model architecture is inferred from the model file:
from diffusers import AutoPipelineForText2Image import torch
pipeline = AutoPipelineForText2Image.from_pretrained(“runwayml/stable-diffusion-v1-5”, torch_dtype=torch.float16 ).to(“cuda”) pipeline.load_lora_weights(“finetune_lora/pokemon”, weight_name=“pytorch_lora_weights.safetensors”) image = pipeline(“A pokemon with blue eyes”).images[0] |
The parameters to load_lora_weights()
is the directory name and the file name to your trained LoRA file. This works for other LoRA files, such as those you downloaded from Civitai.
Further Reading
This section provides more resources on the topic if you want to go deeper.
Summary
In this post, you saw how to create your own LoRA model, given a set of images and the description text. This is a time-consuming process, but the result is that you have a small weight file that can modify the behavior of the diffusion model. You learned how to run the training of LoRA using diffusers
library. You also saw how to use a LoRA weight in your Stable Diffusion pipeline code.