- Text-to-Image, Image-to-Image, Inpainting, and Outpainting pipelines. Our pipelines support the exact same parameters as the Stable Diffusion Web UI, so you can easily replicate creations from the Web UI on the SDK.
- Upscaling Pipelines that can run inference for any Esrgan or Real Esrgan upscaler in a few lines of code.
- An integration with Civit AI to directly download models from the website.
Join our Discord!!
We have a colab demo where you can run many of the operations of Auto 1111 SDK. Check it out here!!
We recommend installing Auto 1111 SDK in a virtual environment from PyPI. Right now, we do not have support for conda environments yet.
pip3 install auto1111sdk
To install the latest version of Auto 1111 SDK (with controlnet now included), run:
pip3 install git https://github.com/saketh12/Auto1111SDK.git
Generating images with Auto 1111 SDK is super easy. To run inference for Text-to-Image, Image-to-Image, Inpainting, Outpainting, or Stable Diffusion Upscale, we have 1 pipeline that can support all these operations. This saves a lot of RAM from having to create multiple pipeline objects with other solutions.
from auto1111sdk import StableDiffusionPipeline
pipe = StableDiffusionPipeline("<Path to your local safetensors or checkpoint file>")
prompt = "a picture of a brown dog"
output = pipe.generate_txt2img(prompt = prompt, height = 1024, width = 768, steps = 10)
output[0].save("image.png")
Right now, Controlnet only works with fp32. We are adding support for fp16 very soon.
from auto1111sdk import StableDiffusionPipeline
from auto1111sdk import ControlNetModel
model = ControlNetModel(model="<THE CONTROLNET MODEL FILE NAME (WITHOUT EXTENSION)>",
image="<PATH TO IMAGE>")
pipe = StableDiffusionPipeline("<Path to your local safetensors or checkpoint file>", controlnet=model)
prompt = "a picture of a brown dog"
output = pipe.generate_txt2img(prompt = prompt, height = 1024, width = 768, steps = 10)
output[0].save("image.png")
Find the instructions here. Contributed by by Marco Guardigli, [email protected]
We have more detailed examples/documentation of how you can use Auto 1111 SDK here. For a detailed comparison between us and Huggingface diffusers, you can read this.
For a detailed guide on how to use SDXL, we recommend reading this
- Original txt2img and img2img modes
- Real ESRGAN upscale and Esrgan Upscale (compatible with any pth file)
- Outpainting
- Inpainting
- Stable Diffusion Upscale
- Attention, specify parts of text that the model should pay more attention to
- a man in a
((tuxedo))
- will pay more attention to tuxedo - a man in a
(tuxedo:1.21)
- alternative syntax - select text and press
Ctrl Up
orCtrl Down
(orCommand Up
orCommand Down
if you're on a MacOS) to automatically adjust attention to selected text (code contributed by anonymous user)
- a man in a
- Composable Diffusion: a way to use multiple prompts at once
- separate prompts using uppercase AND
- also supports weights for prompts: a cat :1.2 AND a dog AND a penguin :2.2
- Works with a variety of samplers
- Download models directly from Civit AI and RealEsrgan checkpoints
- Set custom VAE: works for any model including SDXL
- Support for SDXL with Stable Diffusion XL Pipelines
- Pass in custom arguments to the models
- No 77 prompt token limit (unlike Huggingface Diffusers, which has this limit)
- Adding support Hires Fix and Refiner parameters for inference.
- Adding support for Lora's
- Adding support for Face restoration
- Adding support for Dreambooth training script.
- Adding support for custom extensions like Controlnet.
We will be adding support for these features very soon. We also accept any contributions to work on these issues!
Auto1111 SDK is continuously evolving, and we appreciate community involvement. We welcome all forms of contributions - bug reports, feature requests, and code contributions.
Report bugs and request features by opening an issue on Github. Contribute to the project by forking/cloning the repository and submitting a pull request with your changes.
Licenses for borrowed code can be found in Settings -> Licenses
screen, and also in html/licenses.html
file.
- Automatic 1111 Stable Diffusion Web UI - https://github.com/AUTOMATIC1111/stable-diffusion-webui
- Stable Diffusion - https://github.com/Stability-AI/stablediffusion, https://github.com/CompVis/taming-transformers
- k-diffusion - https://github.com/crowsonkb/k-diffusion.git
- ESRGAN - https://github.com/xinntao/ESRGAN
- MiDaS - https://github.com/isl-org/MiDaS
- Ideas for optimizations - https://github.com/basujindal/stable-diffusion
- Cross Attention layer optimization - Doggettx - https://github.com/Doggettx/stable-diffusion, original idea for prompt editing.
- Cross Attention layer optimization - InvokeAI, lstein - https://github.com/invoke-ai/InvokeAI (originally http://github.com/lstein/stable-diffusion)
- Sub-quadratic Cross Attention layer optimization - Alex Birch (Birch-san/diffusers#1), Amin Rezaei (https://github.com/AminRezaei0x443/memory-efficient-attention)
- Textual Inversion - Rinon Gal - https://github.com/rinongal/textual_inversion (we're not using his code, but we are using his ideas).
- Idea for SD upscale - https://github.com/jquesnelle/txt2imghd
- Noise generation for outpainting mk2 - https://github.com/parlance-zz/g-diffuser-bot
- CLIP interrogator idea and borrowing some code - https://github.com/pharmapsychotic/clip-interrogator
- Idea for Composable Diffusion - https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch
- xformers - https://github.com/facebookresearch/xformers
- Sampling in float32 precision from a float16 UNet - marunine for the idea, Birch-san for the example Diffusers implementation (https://github.com/Birch-san/diffusers-play/tree/92feee6)