Stable, multi-view Point·E

In this repo we introduce multi-view conditioning for point-cloud diffusion, we test it in two pipelines: multiple synthetic views from text; multiple views from photos in the wild. We develop an evaluation dataset based on ShapeNet and ModelNet and propose a new metric to assess visually and analitically the overlap between two point clouds. This repo is based on the official implementation of Point-E.

Point-E is a diffusion model: a generative model that approximates a data distribution through noising (forward process) and denoising (backward process). The backward process is also named "sampling", as you start from a noisy point in the distribution and convert it back to signal with some conditional information. In Point-E, we start from a random point cloud of 1024 points and denoise it with images (an object photo) as conditioning signal.

Compared to other techniques in literature, such as Neural Radiance Fields, you can sample a point cloud with Point-E with a single gpu in 1-2 minutes. Sample quality is the price to pay, making this technique ideal for task where point clouds are best suited.

Contributions

We extend conditioning for point cloud diffusion with multiple views. This tackles the problem of generating objects with duplicated faces, blurring in occluded parts and 3d consistency.

Multi-view with patch concatenation

Each conditioning image is encoded with the pre-trained OpenAI CLIP, all the resulting embeddings are concatenated and fed as tokens into the denoising transformer.
See: mv_point_e/models/transformer.py

Original image: Nichol et al. 2022

Multi-view with stochastic conditioning

With inspiration from Watson et al. 2022, a random conditioning image (from a given multi-view set) is fed to the denoising transformer at each diffusion denoising step.
See: sc_point_e/models/transformer.py

Original image: Watson et al. 2022

Multiple synthetic views with 3D-Diffusion

We use 3D-Diffusion from Watson et al. 2022 to generate 3d-consistent multiple views from a single, text-generated image (with stable diffusion 2). The current model is pre-trained on SRNCars, a ShapeNet version will be released soon (contribute here).

Setup

There are two variants for multi-view:

Patch concatenation: mv_point_e
Stochastic conditioning: sc_point_e

You can either:

Rename the folder of version you choose to point_e and run pip install -e .
Without installing a global package, import from the specific variant in your code, e.g. for sc_point_e:

from sc_point_e.diffusion.configs import DIFFUSION_CONFIGS, diffusion_from_config
from sc_point_e.diffusion.sampler import PointCloudSampler
from sc_point_e.models.download import load_checkpoint
from sc_point_e.models.configs import MODEL_CONFIGS, model_from_config

from sc_point_e.evals.feature_extractor import PointNetClassifier, get_torch_devices
from sc_point_e.evals.fid_is import compute_statistics
from sc_point_e.evals.fid_is import compute_inception_score
from sc_point_e.util.plotting import plot_point_cloud

Experiments

Preprocessing

[1] Generating the textureless objects dataset (views, ground shapes).
[2] Generating the complete textured objects dataset (views, ground shapes).

3D Reconstruction

[1] Text-to-3d with Stable Diffusion 2 Inpainting (single view)
[2] Text-to-3d with multiple rendered views from the SRNCars Dataset (multi-view)
[3] Text-to-3d with multiple synthetic views from Stable Diffusion 3D-Diffusion (Watson et al. 2022)
[4] Text-to-3d from multiple photos "in the wild"

Evaluation, metrics

[1] Dataset pre-processing and scores computation
[2] A digression on the chosen metrics with experiments
[3] Evaluating text-to-3D from multi view (patch concat.)
[4] Comparing the chosen multi-view, text-to-3D methodologies
[5, 6] Evaluating results on occluded object parts
[7] Scores visualization and plotting

Evaluation

This dataset has been developed to assess the quality of the reconstructions from our multi-view models wrt. single-view Point-E. Through experimentation, we generated several datasets from the available sources ModelNet40, ShapeNetV2, ShapeNetV0. Specifically, the datasets generated from ModelNet40, ShapeNetV0 are textureless: we generated synthetic colouring through since RGB/grayscale values and sine functions.

Getting Started & Installing

The complete set of data can be found at this link.

Name	Samples	Source
ModelNet40, textureless	40	Google Drive
ShapeNetv2, textureless	55	Google Drive
Mixed, textureless	190	Google Drive
Shapenet with textures	650	Google Drive
OpenAI seed imgs/clouds	/	Google Drive
OpenAI, COCO CLIP R-Precision evals	/	Google Drive

Here you can find the generated clouds from the dataset ShapeNetv2 and ModelNet40 textureless comprehensive of the ground truth data, score and plot of the pairwise divergence distribution. More details are provided in the description.

Description

Each sample in the dataset consists in a set of RGB, 256x256 V views and a cloud of K points sampled with PyTorch3D.

    view:   (N, V, 256, 256, 3)
    cloud:  (N, K, 3)

Further details on rendering:

The light of the scene is fixed
No reflections
Two versions of the dataset:
- Fixed elevation and distance of the camera from the object, we took 6 pictures rotating around the object
- Fixed the distance of the camera from the object, we took 6 pictures changing stochastically the value of the elevation of the camera and rotating around the object
We iterate this procedure on 25 different objects for each class in ShapeNet
Each view is 256x256

You can see the pipeline for the generation of the ShapeNet dataset with textures here.

Concerning the set of views in the dataset produced from ShapeNetv2 and ModelNet40 textureless:

The light of the scene is fixed
No reflections
We fixed the elevation and the distance of the camera from the object and we took 4 pictures rotating around the object
We iterate this procedure on one object for each class in ShapeNetv2 and ModelNet40
Each view is 512x512

You can check the pipeline for the generation of the ShapeNetv2 and ModelNet40 textureless dataset here with all the steps.

Here follows the directories structure:

<directories>
    > shapenet_withTextures
        >> eval_clouds.pickle
        >> eval_views_fixed_elevation.pickle
        >> eval_views_stochastic_elevation.pickle       
    > modelnet40_texrand_texsin
        >> modelnet_csinrandn
            >>> CLASS_MAP.pt
            >>> images_obj.pt
            >>> labels.pt
            >>> points.pt
        >> modelnet_texsin
            >>> CLASS_MAP.pt
            >>> images_obj.pt
            >>> labels.pt
            >>> points.pt
    > shapenetv2_texrand_texsin
        >> shapenetv2_csinrandn
            >>> CLASS_MAP.pt
            >>> images_obj.pt
            >>> labels.pt
            >>> points.pt
        >> shapenetv2_texsin
            >>> CLASS_MAP.pt
            >>> images_obj.pt
            >>> labels.pt
            >>> points.pt
    > shapenetv2_modelnet40_texrand_texsin
        >> shapenet_modelnet_singleobject
            >>> modelnet_csinrandn
                >>>> CLASS_MAP.pt
                >>>> images_obj.pt
                >>>> labels.pt
                >>>> points.pt
            >>> modelnet_texsin
                >>>> CLASS_MAP.pt
                >>>> images_obj.pt
                >>>> labels.pt
                >>>> points.pt
            >>> shapenet_csinrandn
                >>>> CLASS_MAP.pt
                >>>> images_obj.pt
                >>>> labels.pt
                >>>> points.pt
            >>> shapenet_texsin
                >>>> CLASS_MAP.pt
                >>>> images_obj.pt
                >>>> labels.pt
                >>>> points.pt
    > dataset_shapenet_modelnet_texsin_withgeneratedcloud
        >> modelnet_texsin
            >>> CLASS_MAP.pt
            >>> eval_clouds_modelnet_300M.pickle
            >>> images_obj.pt
            >>> labels.pt
            >>> modelnet_gencloud_300M
            >>> points.pt
        >> shapenet_texsin
            >>> CLASS_MAP.pt
            >>> eval_clouds_shapenet_300M.pickle
            >>> images_obj.pt
            >>> labels.pt
            >>> shapenet_gencloud_300M
            >>> points.pt

File specifications

shapenet_withTextures

- list of the sampled cloud: eval_clouds.pickle # (n_img, ch, n_points) ch: 6, n_points: 4096

- list of gen views with fixed elevation: eval_views_fixed_elevation.pickle # (n_img, n_view, 256, 256, 3)

- list of gen views with stochastic elevation: eval_views_stochastic_elevation.pickle # (n_img, n_view, 256, 256, 3)

shapenetv2_modelnet40_texrand_texsin

- dictionary with {index: 'typeOfObject'}: CLASS_MAP.pt 

- multiple viwes for each object: images_obj.pt # (n_img, n_view, 512, 512, 3)

- label for each object: labels.pt # (n_img,)

- ground truth point cloud: points.pt # (n_img,)

- tensor with the the generated pointcloud with point-e 300M: 
  ch: 6 (first 3 channel coord the others are the rgb colors of each point)
  n_points: 4096 (generated points)
                                                  modelnet_gencloud_300M # (n_img, ch, n_points)
                                                  shapenet_gencloud_300M # (n_img, ch, n_points)

dataset_shapenet_modelnet_texsin_withgeneratedcloud

- dictionaries: 
            eval_clouds_modelnet_300M.pickle
            eval_clouds_shapenet_300M.pickle

    dictionary['nameOfTheObject'][index]

                                  index 0: divergence_ground_single
                                  index 1: divergence_ground_single_distribution_plot
                                  index 2: divergence_ground_multi
                                  index 3: divergence_ground_multi_distribution_plot
                                  index 4: divergence_single_multi
                                  index 5: divergence_single_multi_distribution_plot
                                  index 6: ground_truth_pis
                                  index 7: single_view_pis
                                  index 8: multi_view_pis
                                  index 9: ground_truth_point_cloud
                                  index 10: single_view_point_cloud
                                  index 11: multi_view_point_cloud

Dependencies

import the files pt with torch

images_obj_views = torch.load(os.path.join(base_path,'images_obj.pt'))

import the pickle file with the metrics or the shapenet_withTextures files
more info in the notebook1 or notebook2.

dataset = 'shapenet'
base_path = os.path.join(dataset "_texsin")
with open(os.path.join(base_path, 'eval_clouds_' dataset '_300M.pickle'), 'rb') as handle:
    data = pickle.load(handle)

Possible improvements

Extending the dataset ShapeNet PSR
Increasing the view resolution 512x512 or 1024x1024

Authors

Diego Calanzone @diegocalanzone
Riccardo Tedoldi @riccardotedoldi

Version History

0.1
- Initial Release

License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Acknowledgments

Credits

Cite this work

@misc{CalanzoneTedoldi2022,
    title   = {Generating point clouds from multiple views with Point-E},
    author  = {Diego Calanzone, Riccardo Tedoldi, Zeno Sambugaro},
    year    = {2023},
    url  = {http://github.com/halixness/point-e}
}

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
3d-diffusion		3d-diffusion
data		data
img		img
mv_point_e		mv_point_e
notebooks		notebooks
sc_point_e		sc_point_e
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
model-card.md		model-card.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stable, multi-view Point·E

Table of contents

Contributions

Multi-view with patch concatenation

Multi-view with stochastic conditioning

Multiple synthetic views with 3D-Diffusion

Setup

Experiments

Preprocessing

3D Reconstruction

Evaluation, metrics

Evaluation

Getting Started & Installing

Description

File specifications

Dependencies

Possible improvements

Authors

Version History

License

Acknowledgments

Credits

Cite this work

About

Releases

Packages

Languages

License

r1cc4r2o/point-e

Folders and files

Latest commit

History

Repository files navigation

Stable, multi-view Point·E

Table of contents

Contributions

Multi-view with patch concatenation

Multi-view with stochastic conditioning

Multiple synthetic views with 3D-Diffusion

Setup

Experiments

Preprocessing

3D Reconstruction

Evaluation, metrics

Evaluation

Getting Started & Installing

Description

File specifications

Dependencies

Possible improvements

Authors

Version History

License

Acknowledgments

Credits

Cite this work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages