[Paper] [Demo in 🤗Hugging Face Space] [(🔥New) Code and Pre-trained Models]
by Xingchao Liu, Xiwen Zhang, Jianzhu Ma, Jian Peng, Qiang Liu from Helixon Research and UT Austin
- (🔥New) 2023/11/22 One-step InstaFlow is compatible with pre-trained ControlNets. See here. (We thank individual contributor Dr. Hanshu Yan)
- (🔥New) 2023/11/22 We release the pre-trained models and inference codes here.
- 2023/09/26 We provide a demo of InstaFlow-0.9B in 🤗Hugging Face Space. Try it here.
Diffusion models have demonstrated remarkable promises in text-to-image generation. However, their efficacy is still largely hindered by computational constraints stemming from the need of iterative numerical solvers at the inference time for solving the diffusion/flow processes.
InstaFlow is an ultra-fast
, one-step
image generator that achieves image quality close to Stable Diffusion, significantly reducing the demand of computational resources. This efficiency is made possible through a recent Rectified Flow technique, which trains probability flows with straight trajectories, hence inherently requiring only a single step for fast inference.
InstaFlow has several advantages:
Ultra-Fast Inference
: InstaFlow models are one-step generators, which directly map noises to images and avoid multi-step sampling of diffusion models. On our machine with A100 GPU, the inference time is around 0.1 second, saving ~90% of the inference time compared to the original Stable Diffusion.High-Quality
: InstaFlow generates images with intricate details like Stable Diffusion, and have similar FID on MS COCO 2014 as state-of-the-art text-to-image GANs, like StyleGAN-T.Simple and Efficient Training
: The training process of InstaFlow merely involves supervised training. Leveraging pre-trained Stable Diffusion, it only takes 199 A100 GPU days to get InstaFlow-0.9B.
interpolation.mp4
One-step InstaFlow is fully compatible with pre-trained ControlNets. We thank individual contributor Dr. Hanshu Yan for providing and testing the Rectified Flow ControlNet pipeline!
Below are One-Step Generation with InstaFlow-0.9B ControlNet:
For an intuitive understanding, we used the same A100 server and took screenshots from the Gridio interface of random generation with different models. InstaFlow-0.9B is one-step, while SD 1.5 adopts 25-step DPMSolver. It takes around 0.3 second to download the image from the server. The text prompt is "A photograph of a snowy mountain near a beautiful lake under sunshine."
InstaFlow-0.9B | Stable Diffusion 1.5 |
---|
method_github.mov
Our pipeline consists of three steps:
- Generate (text, noise, image) triplets from pre-trained Stable Diffusion
- Apply
text-conditioned reflow
to yield 2-Rectified Flow, which is a straightened generative probaiblity flow. - Distill from 2-Rectified Flow to get One-Step InstaFlow. Note that distillation and reflow are
orthogonal techniques
.
As captured in the video and the image, straight flows have the following advantages:
- Straight flows require fewer steps to simulate.
- Straight flows give better coupling between the noise distribution and the image distribution, thus allow successful distillation.
We provide several related links and readings here:
-
The official Rectified Flow github repo (https://github.com/gnobitab/RectifiedFlow)
-
An introduction of Rectified Flow (https://www.cs.utexas.edu/~lqiang/rectflow/html/intro.html)
-
An introduction of Rectified Flow in Chinese--Zhihu (https://zhuanlan.zhihu.com/p/603740431)
-
FlowGrad: Controlling the Output of Generative ODEs With Gradients (https://github.com/gnobitab/FlowGrad)
-
Fast Point Cloud Generation with Straight Flows (https://github.com/klightz/PSF)
@article{liu2023insta,
title={InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation},
author={Liu, Xingchao and Zhang, Xiwen and Ma, Jianzhu and Peng, Jian and Liu, Qiang},
journal={arXiv preprint arXiv:2309.06380},
year={2023}
}
Our training scripts are modified from one of the fine-tuning examples in Diffusers. Other parts of our work also heavily relies on the 🤗 Diffusers library.