AnyDoor

Abstract

This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations in a harmonious way. Instead of tuning parameters for each object, our model is trained only once and effortlessly generalizes to diverse object-scene combinations at the inference stage. Such a challenging zero-shot setting requires an adequate characterization of a certain object. To this end, we complement the commonly used identity feature with detail features, which are carefully designed to maintain texture details yet allow versatile local variations (e.g., lighting, orientation, posture, etc.), supporting the object in favorably blending with different surroundings. We further propose to borrow knowledge from video datasets, where we can observe various forms (i.e., along the time axis) of a single object, leading to stronger model generalizability and robustness. Extensive experiments demonstrate the superiority of our approach over existing alternatives as well as its great potential in real-world applications, such as virtual try-on and object moving.

Object Moving

AnyDoor could be applied to fancy tasks like object moving.

Object Swapping

AnyDoor could also be extended to conduct object swapping.

Multi-subject Composition

As AnyDoor is highly controlable for placing the object at specific location of a given scene, it is easy to be extended to multi-subject composition.

Virtual Try-on

AnyDoor could also serve as a simple but stronge baseline for virtual try-on. It could preserve the colors, patterns, and textures for different clothes without the need for complicated human parsing.

BibTeX

@article{chen2023anydoor,
      title={AnyDoor: Zero-shot Object-level Image Customization},
      author={Chen, Xi and Huang, Lianghua and Liu, Yu and Shen, Yujun and Zhao, Deli and Zhao, Hengshuang},
      journal={arXiv preprint},
      year={2023}
    }