Skip to content

Commit

Permalink
⚡ Add Summary for GANCraft
Browse files Browse the repository at this point in the history
  • Loading branch information
praeclarumjj3 authored Jan 20, 2022
2 parents 09cf754 + 19a2cce commit 417eaa5
Show file tree
Hide file tree
Showing 6 changed files with 51 additions and 1 deletion.
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,10 @@ Summaries for papers discussed by VLG.

# Summaries
2021
- GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields [[Paper](https://arxiv.org/pdf/2011.12100)][[Review](./summaries/GIRAFFE.md)] **CVPR 2021**
- GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds [[Paper](https://arxiv.org/pdf/2104.07659)][[Review](./summaries/GANcraft.md)]
- Zekun Hao, Arun Mallya, Serge Belongie, Ming-Yu Liu, **ICCV 2021**
- GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields [[Paper](https://arxiv.org/pdf/2011.12100)][[Review](./summaries/GIRAFFE.md)]
- Michael Niemeyer, Andreas Geiger, **CVPR 2021**
- Creative Sketch Genetation [[Paper](https://arxiv.org/abs/2011.10039)][[Review](https://github.com/Sandstorm831/papers_we_read/blob/master/summaries/DoodlerGAN summary.md)]
- Songwei Ge, Devi Parikh, Vedanuj Goswami & C. Lawrence Zitnick, **ICLR 2021**
- Binary TTC: A Temporal Geofence for Autonomous Navigation[[Paper](https://arxiv.org/abs/2101.04777)][[Review](./summaries/binary_TTC.md)]
Expand Down
Binary file added images/GANcraft_fid_scores.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/GANcraft_model.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/GANcraft_overview.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/GANcraft_results.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
47 changes: 47 additions & 0 deletions summaries/GANcraft.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds
Zekun Hao, Arun Mallya, Serge Belongie, Ming-Yu Liu

## Summary

<img src='../images/GANcraft_model.PNG'>

This paper presents **GANcraft** a volumetric rendering (rendering of the 3D world as 2D images) based approach to model a 3D block world scene with semantic labels as a continuous volumetric function and render view consistent, photorealistic images. In the absence of the paired training data, an image-to-image translation model generates the pseudo ground truth labels for the corresponding photorealistic 3D world.

## Contributions
<img src='../images/GANcraft_results.gif'>

- The novel task of world-to-world translation from
3D block world which can be intuitively constructed in
Minecraft to a realistic 3D world, the 3D extension to a 2D
image to image translation.

- Framework to train neural renderers in the absence of ground
truth data for rendering the realistic-looking world using pseudo
ground truth labels.

- Training neural rendering architecture with adversarial losses
and conditioning on the style image, extending 3D neural rendering
methods.

## Model

<img src='../images/GANcraft_overview.PNG'>

- GANs can successfully map images from one domain to another without paired data but the images generated for mapping a 3D world are not view-consistent and there is a flickering problem.
Neural rendering techniques solve this problem of view consistency but cannot handle the Minecraft block world and real-world domain gap.
- The model takes as input a 3D block world for which a voxel bounded feature field is learned using an MLP model that takes as input the location code, semantic label, and a shared style code. Instead of a view-dependent color used in neural rendering techniques, the network outputs image features **C(r,z)**. Vertices of each voxel are assigned feature vectors which are shared by adjacent voxels ensuring that there are no inconsistencies in the output.
- This feature image is passed into a CNN renderer which converts the per pixel feature map to an RGB image.
The model is trained on the adversarial and perceptual losses for the generated image and reconstruction loss wrt the corresponding pseudo ground truth labels.

## Results

<img src='../images/GANcraft_fid_scores.PNG'>

The model is evaluated based on the FID, KID scores where GANcraft achieves values close to **SPADE** which is a photorealistic image generator and outperforms other baselines on temporal consistency metric based on human preference scores.

## Our Two Cents
- The model achieves state-of-the-art results in the world-to-world translation task in the absence of the ground truth photorealistic images for the segmentation labels of the 3D world.
- There is still a blocky appearance to the output images because of the domain shift in the training images of the spade model and the projected images from the 3D block world.

## Resources
Project Page: https://nvlabs.github.io/GANcraft/

0 comments on commit 417eaa5

Please sign in to comment.