Skip to content

Official Implement of the work "Coherent and Multi-modality Image Inpainting via Latent Space Optimization"

License

Notifications You must be signed in to change notification settings

Lingzhi-Pan/PILOT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PILOT: Coherent and Multi-modality Image Inpainting via Latent Space Optimization


           


Official Implement of PILOT.

Lingzhi Pan, Tong Zhang, Bingyuan Chen, Qi Zhou, Wei Ke, Sabine Susstrunk, Mathieu Salzmann

image

Method Overview

image image

Getting Started

It is recommended to create and use a Torch virtual environment, such as conda. Next, download the appropriate PyTorch version compatible with your CUDA devices, and install the required packages listed in requirements.txt.

git clone https://github.com/Lingzhi-Pan/PILOT.git
cd PILOT
conda create -n pilot python==3.9
conda activate pilot
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

You can download the stable-diffusion-v1-5 model from the website "https://huggingface.co/runwayml/stable-diffusion-v1-5" and save it to your local path.

Run Examples

We provide three types of conditions to guide the inpainting process: text, spatial controls, and reference images. Each condition-control refers to a different configuration in the directory configs/.

Text-guided

Modify the model_path parameter in the config file to point to the directory where you saved your SD model, and then execute the following instruction:

python run_example.py --config_file configs/t2i_step50.yaml

image

Text Spatial Controls

To introduce spatial controls using ControlNet or T2I-Adapter, we offer options for both models, but we recommend using ControlNet. First, download the ControlNet checkpoint, such as ControlNet conditioned on Scribble images, published by Lvmin Zhang from the following link: https://huggingface.co/lllyasviel/sd-controlnet-scribble. Then, execute the instructions below:

python run_example.py --config_file configs/controlnet_step30.yaml

You can also download other ControlNet models published by Lvmin Zhang to enable inpainting with other conditions such as canny map, segmentation map, and normal map. image

Text Reference Image

Download the checkpoint of IP-Adapter from the website "https://huggingface.co/h94/IP-Adapter", and then run the following instruction:

python run_example.py --config_file configs/ipa_step50.yaml

image

Text Spatial Controls Reference Image

You can also use ControlNet and IP-Adapter together to achieve multi-condition controls:

python run_example.py --config_file configs/ipa_controlnet_step30.yaml

image

Personalized Image Inpainting

You can also integrate LORA into the base model or replace the base model with other personalized Text-to-Image (T2I) models to achieve personalized image inpainting. For example, replacing the base model with a T2I model fine-tuned by DreamBooth using several photos of a cute dog can generate the dog inside the masked region while preserving the dog's identity effectively. image

See our Paper for more information!

BibTeX

If you find this work helpful, please consider citing:

@article{pan2024coherent,
  title={Coherent and Multi-modality Image Inpainting via Latent Space Optimization},
  author={Pan, Lingzhi and Zhang, Tong and Chen, Bingyuan and Zhou, Qi and Ke, Wei and S{\"u}sstrunk, Sabine and Salzmann, Mathieu},
  journal={arXiv preprint arXiv:2407.08019},
  year={2024}
}

About

Official Implement of the work "Coherent and Multi-modality Image Inpainting via Latent Space Optimization"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages