⚡ Add Summary for GANCraft

git-kush · Jan 20, 2022 · 417eaa5 · 417eaa5
2 parents 09cf754 + 19a2cce
commit 417eaa5
Show file tree

Hide file tree

Showing 6 changed files with 51 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -4,7 +4,10 @@ Summaries for papers discussed by VLG.
 
 # Summaries
 2021
-- GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields [[Paper](https://arxiv.org/pdf/2011.12100)][[Review](./summaries/GIRAFFE.md)] **CVPR 2021**
+- GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds [[Paper](https://arxiv.org/pdf/2104.07659)][[Review](./summaries/GANcraft.md)] 
+    - Zekun Hao, Arun Mallya, Serge Belongie, Ming-Yu Liu, **ICCV 2021**
+- GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields [[Paper](https://arxiv.org/pdf/2011.12100)][[Review](./summaries/GIRAFFE.md)] 
+    - Michael Niemeyer, Andreas Geiger, **CVPR 2021**
 - Creative Sketch Genetation [[Paper](https://arxiv.org/abs/2011.10039)][[Review](https://github.com/Sandstorm831/papers_we_read/blob/master/summaries/DoodlerGAN summary.md)]
     - Songwei Ge, Devi Parikh, Vedanuj Goswami & C. Lawrence Zitnick, **ICLR 2021**
 - Binary TTC: A Temporal Geofence for Autonomous Navigation[[Paper](https://arxiv.org/abs/2101.04777)][[Review](./summaries/binary_TTC.md)]

diff --git a/images/GANcraft_fid_scores.PNG b/images/GANcraft_fid_scores.PNG
diff --git a/images/GANcraft_model.PNG b/images/GANcraft_model.PNG
diff --git a/images/GANcraft_overview.PNG b/images/GANcraft_overview.PNG
diff --git a/images/GANcraft_results.gif b/images/GANcraft_results.gif
diff --git a/summaries/GANcraft.md b/summaries/GANcraft.md
@@ -0,0 +1,47 @@
+# GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds
+Zekun Hao, Arun Mallya, Serge Belongie, Ming-Yu Liu
+
+## Summary
+
+<img src='../images/GANcraft_model.PNG'>
+
+This paper presents **GANcraft** a volumetric rendering (rendering of the 3D world as 2D images) based approach to model a 3D block world scene with semantic labels as a continuous volumetric function and render view consistent, photorealistic images. In the absence of the paired training data, an image-to-image translation model generates the pseudo ground truth labels for the corresponding photorealistic 3D world.
+
+## Contributions
+<img src='../images/GANcraft_results.gif'>
+
+- The novel task of world-to-world translation from 
+3D block world which can be intuitively constructed in
+Minecraft to a realistic 3D world, the 3D extension to a 2D
+image to image translation.
+
+- Framework to train neural renderers in the absence of ground 
+truth data for rendering the realistic-looking world using pseudo 
+ground truth labels.
+
+- Training neural rendering architecture with adversarial losses 
+and conditioning on the style image, extending 3D neural rendering
+methods.
+
+## Model
+
+<img src='../images/GANcraft_overview.PNG'>
+
+- GANs can successfully map images from one domain to another without paired data but the images generated for mapping a 3D world are not view-consistent and there is a flickering problem. 
+Neural rendering techniques solve this problem of view consistency but cannot handle the Minecraft block world and real-world domain gap.
+- The model takes as input a 3D block world for which a voxel bounded feature field is learned using an MLP model that takes as input the location code, semantic label, and a shared style code. Instead of a view-dependent color used in neural rendering techniques, the network outputs image features **C(r,z)**. Vertices of each voxel are assigned feature vectors which are shared by adjacent voxels ensuring that there are no inconsistencies in the output. 
+- This feature image is passed into a CNN renderer which converts the per pixel feature map to an RGB image.
+The model is trained on the adversarial and perceptual losses for the generated image and reconstruction loss wrt the corresponding pseudo ground truth labels.
+
+## Results
+
+<img src='../images/GANcraft_fid_scores.PNG'>
+
+The model is evaluated based on the FID, KID scores where GANcraft achieves values close to **SPADE** which is a photorealistic image generator and outperforms other baselines on temporal consistency metric based on human preference scores.
+
+## Our Two Cents
+- The model achieves state-of-the-art results in the world-to-world translation task in the absence of the ground truth photorealistic images for the segmentation labels of the 3D world.
+- There is still a blocky appearance to the output images because of the domain shift in the training images of the spade model and the projected images from the 3D block world.
+
+## Resources
+Project Page: https://nvlabs.github.io/GANcraft/