Skip to content

Latest commit

 

History

History

video_nerf

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Train a NeRF model with PyTorchVideo and PyTorch3D

This project demonstrates how to use the video decoder from PyTorchVideo to load frames from a video of an object from the Objectron dataset, and use this to train a NeRF [1] model with PyTorch3D. Instead of decoding and storing all the video frames as images, PyTorchVideo offers an easy alternative to load and access frames on the fly. For this project we will be using the NeRF implementation from PyTorch3D.

Set up

Installation

Install PyTorch3D

# Create new conda environment
conda create -n 3ddemo
conda activate 3ddemo

# Install PyTorch3D
conda install -c pytorch pytorch=1.7.1 torchvision cudatoolkit=10.1
conda install -c conda-forge -c fvcore -c iopath fvcore iopath
conda install pytorch3d -c pytorch3d-nightly

Install PyTorchVideo if you haven't installed it already (assuming you have cloned the repo locally):

cd pytorchvideo
python -m pip install -e .

Install some extras libraries needed for NeRF:

pip install visdom Pillow matplotlib tqdm plotly
pip install hydra-core --upgrade

Set up NeRF Model

We will be using the PyTorch3D NeRF implementation. We have already installed the PyTorch3d conda packages, so now we only need to clone the NeRF implementation:

cd pytorchvideo/tutorials/video_nerf
git clone https://github.com/facebookresearch/pytorch3d.git
cp -r pytorch3d/projects/nerf .

# Remove the rest of the PyTorch3D repo
rm -r pytorch3d

Dataset

Download the Objectron repo

The repo contains helper functions for reading the metadata files. Clone it to the path pytorchvideo/tutorials/video_nerf/Objectron.

git clone https://github.com/google-research-datasets/Objectron.git

# Also install protobuf for parsing the metadata
pip install protobuf
Download an example video

For this demo we will be using a short video of a chair from the Objectron dataset. Each video is accompanied by metadata with the camera parameters for each frame. You can download an example video for a chair and the associated metadata by running the following script:

python download_objectron_data.py

The data files will be downloaded to the path: pytorchvideo/tutorials/video_nerf/nerf/data/objectron. Within the script you can change the index of the video to use to obtain a different chair video. We will create and save a random split of train/val/test when the video is first loaded by the NeRF model training script.

Most of the videos are recorded in landscape mode with image size (H, W) = [1440, 1920].

Set up new configs

For this dataset we need a new config file and data loader to use it with the PyTorch3D NeRF implementation. Copy the relevant dataset and config files into the nerf folder and replace the original files:

# Make sure you are at the path: pytorchvideo/tutorials/video_nerf
# Rename the current dataset file
mv nerf/nerf/dataset.py nerf/nerf/nerf_dataset.py

# Move the new objectron specific files into the nerf folder
mv dataset.py nerf/nerf/dataset.py
mv dataset_utils.py nerf/nerf/dataset_utils.py
mv objectron.yaml nerf/configs

In the video_dataset.py file we use the PyTorchVideo EncodedVideo class to load a video .MOV file, decode it into frames and access the frames by the index.

Train model

Run the model training:

cd nerf
python ./train_nerf.py --config-name objectron

Visualize predictions

Predictions and metrics will be logged to Visdom. Before training starts launch the visdom server:

python -m visdom.server

Navigate to https://localhost:8097 to view the logs and visualizations.

After training, you can generate predictions on the test set:

python test_nerf.py --config-name objectron test.mode='export_video' data.image_size="[96,128]"

For a higher resolution video you can increase the image size to e.g. [192, 256] (note that this will slow down inference).

You will need to specify the scene_center for the video in the objectron.yaml file. This is set for the demo video specified in download_objectron_data.py. For a different video you can calculate the scene center inside eval_video_utils.py. After line 99 you can add the following code to compute the center:

# traj is the circular camera trajectory on the camera mean plane.
# We want the camera to always point towards the center of this trajectory.
x_center = traj[..., 0].mean().item()
z_center = traj[..., 2].mean().item()
y_center = traj[0, ..., 1]
scene_center = [x_center, y_center, z_center]

You can also point the camera down/up relative to the camera mean plane e.g. y_center -= 0.5

Here is an example of a video reconstruction generated using a trained NeRF model. NOTE: the quality of reconstruction is highly dependent on the camera pose range and accuracy in the annotations - try training a model for a few different chairs in the dataset to see which one has the best results.

References

[1] Ben Mildenhall and Pratul P. Srinivasan and Matthew Tancik and Jonathan T. Barron and Ravi Ramamoorthi and Ren Ng, NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, ECCV2020