Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior

Project page | Paper

Junshu Tang, Tengfei Wang, Bo Zhang, Ting Zhang, Ran Yi, Lizhuang Ma, and Dong Chen.

Abstract

In this work, we investigate the problem of creating high-fidelity 3D content from only a single image. This is inherently challenging: it essentially involves estimating the underlying 3D geometry while simultaneously hallucinating unseen textures. To address this challenge, we leverage prior knowledge from a well-trained 2D diffusion model to act as 3D-aware supervision for 3D creation. Our approach, Make-It-3D, employs a two-stage optimization pipeline: the first stage optimizes a neural radiance field by incorporating constraints from the reference image at the frontal view and diffusion prior at novel views; the second stage transforms the coarse model into textured point clouds and further elevates the realism with diffusion prior while leveraging the high-quality textures from the reference image. Extensive experiments demonstrate that our method outperforms prior works by a large margin, resulting in faithful reconstructions and impressive visual quality. Our method presents the first attempt to achieve high-quality 3D creation from a single image for general objects and enables various applications such as text-to-3D creation and texture editing.

Some demo of coarse stage

SAM Make-It-3D

Installation

Install with pip:

    pip install torch==1.10.0 cu113 torchvision==0.11.1 cu113 torchaudio===0.10.0 cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
    pip install git https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
    pip install git https://github.com/openai/CLIP.git
    pip install git https://github.com/huggingface/diffusers.git
    pip install git https://github.com/huggingface/huggingface_hub.git
    pip install git https://github.com/facebookresearch/pytorch3d.git

Other dependencies:

    pip install -r requirements.txt 
    pip install ./raymarching

Training requirements

DPT. We use an off-the-shelf single-view depth estimator DPT to predict the depth for the reference image.
```
git clone https://github.com/isl-org/DPT.git
mkdir dpt_weights
```
Download the pretrained model dpt_hybrid, and put it in dpt_weights.
BLIP2. We use BLIP2 to generate a caption. You can also modify the conditioned text using --text "{TEXT}".
Stable Diffusion. We use diffusion prior from a pretrained 2D Stable Diffusion 2.0 model. To start with, you may need a huggingface token to access the model, or use huggingface-cli login command.

Coarse Stage Training

We use progressive training strategy to generate a full 360° 3D geometry. Run the command and modify the workspace name NAME and the path of the reference image IMGPATH. We first optimize the scene under frontal camera views.

    python main.py --workspace ${NAME} --ref_path "${IMGPATH}" --phi_range 135 225 --iters 2000

Then we spread the camera view samples to full 360°. If you need a prompt condition "back view", you can use the command --need_back.

    python main.py --workspace ${NAME} --ref_path "${IMGPATH}" --phi_range 0 360 --albedo_iters 3000 --iters 5000 --final

Refine Stage Training

Coming soon...

Important Note

Hallucinating 3D geometry and generating novel views from a single image of general genre is a challenging task. While our method demonstrates strong capability on creating 3D from most images with a centered single object, it may still encounter difficulties in reconstructing solid geometry on complex cases. If you encounter any bugs, please feel free to contact us.

Todo

Release coarse stage training code
Release the test benchmark for all results in the paper
Release all training code
Release more applications

Citation

If you find this code helpful for your research, please cite:

@article{tang2023make,
  title={Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior},
  author={Tang, Junshu and Wang, Tengfei and Zhang, Bo and Zhang, Ting and Yi, Ran and Ma, Lizhuang and Chen, Dong},
  journal={arXiv preprint arXiv:2303.14184},
  year={2023}
}

Acknowledgments

This code borrows heavily from Stable-Dreamfusion.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
demo		demo
nerf		nerf
raymarching		raymarching
README.md		README.md
activation.py		activation.py
encoding.py		encoding.py
main.py		main.py
optimizer.py		optimizer.py
requirements.txt		requirements.txt
teaser.png		teaser.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior

Project page | Paper

Abstract

Some demo of coarse stage

SAM Make-It-3D

Installation

Coarse Stage Training

Refine Stage Training

Important Note

Todo

Citation

Acknowledgments

About

Releases

Packages

Languages

bjollans/Make-It-3D

Folders and files

Latest commit

History

Repository files navigation

Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior

Project page | Paper

Abstract

Some demo of coarse stage

SAM Make-It-3D

Installation

Coarse Stage Training

Refine Stage Training

Important Note

Todo

Citation

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages