MVDream

Yichun Shi, Peng Wang, Jianglong Ye, Long Mai, Kejie Li, Xiao Yang

3D Generation

This repository only includes the diffusion model and 2D image generation code of MVDream paper.
For 3D Generation, please check MVDream-threestudio.

Installation

You can use the same environment as in Stable-Diffusion for this repo. Or you can set up the environment by installing the given requirements

pip install -r requirements.txt

To use MVDream as a python module, you can install it by pip install -e . or:

pip install git https://github.com/bytedance/MVDream

Model Card

Our models are provided on the Huggingface Model Page with the OpenRAIL license.

Model	Base Model	Resolution
sd-v2.1-base-4view	Stable Diffusion 2.1 Base	4x256x256
sd-v1.5-4view	Stable Diffusion 1.5	4x256x256

By default, we use the SD-2.1-base model in our experiments.

Note that you don't have to manually download the checkpoints for the following scripts.

Text-to-Image

You can simply generate multi-view images by running the following command:

python scripts/t2i.py --text "an astronaut riding a horse"

We also provide a gradio script to try out with GUI:

python scripts/gradio_app.py

Usage

Load the Model

We provide two ways to load the models of MVDream:

Automatic: load the model config with model name and weights from huggingface.

from mvdream.model_zoo import build_model
model = build_model("sd-v2.1-base-4view")

Manual: load the model with a config file and a checkpoint file.

from omegaconf import OmegaConf
from mvdream.ldm.util import instantiate_from_config
config = OmegaConf.load("mvdream/configs/sd-v2-base.yaml")
model = instantiate_from_config(config.model)
model.load_state_dict(torch.load("path/to/sd-v2.1-base-4view.th", map_location='cpu'))

Inference

Here is a simple example for model inference:

import torch
from mvdream.camera_utils import get_camera
model.eval()
model.cuda()
with torch.no_grad():
    noise = torch.randn(4,4,32,32, device="cuda") # batch of 4x for 4 views, latent size 32=256/8
    t = torch.tensor([999]*4, dtype=torch.long, device="cuda") # same timestep for 4 views
    cond = {
        "context": model.get_learned_conditioning([""]*4).cuda(), # text embeddings
        "camera": get_camera(4).cuda(),
        "num_frames": 4,
    }
    eps = model.apply_model(noise, t, cond=cond)

Acknowledgement

This repository is heavily based on Stable Diffusion. We would like to thank the authors of these work for publicly releasing their code.

Citation

@article{shi2023MVDream,
  author = {Shi, Yichun and Wang, Peng and Ye, Jianglong and Mai, Long and Li, Kejie and Yang, Xiao},
  title = {MVDream: Multi-view Diffusion for 3D Generation},
  journal = {arXiv:2308.16512},
  year = {2023},
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
mvdream		mvdream
scripts		scripts
.gitignore		.gitignore
LICENSE-CODE		LICENSE-CODE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MVDream

3D Generation

Installation

Model Card

Text-to-Image

Usage

Load the Model

Inference

Acknowledgement

Citation

About

Releases

Packages

Contributors 2

Languages

License

bytedance/MVDream

Folders and files

Latest commit

History

Repository files navigation

MVDream

3D Generation

Installation

Model Card

Text-to-Image

Usage

Load the Model

Inference

Acknowledgement

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages