Initial commit.

zdd985 · Apr 29, 2021 · 6b20297 · 6b20297
1 parent 4da8841
commit 6b20297
Show file tree

Hide file tree

Showing 34 changed files with 9,094 additions and 0 deletions.
diff --git a/LICENSE.md b/LICENSE.md
@@ -0,0  1,6 @@
 Copyright Snap Inc. 2021. This sample code is made available by Snap Inc. for informational purposes only. No license,
 whether implied or otherwise, is granted in or to such code (including any rights to copy, modify, publish, distribute
 and/or commercialize such code), unless you have entered into a separate agreement for such rights. Such code is
 provided as-is, without warranty of any kind, express or implied, including any warranties of merchantability, title,
 fitness for a particular purpose, non-infringement, or that such code is free of defects, errors or viruses. In no event
 will Snap Inc. be liable for any damages or losses of any kind arising from the sample code or your use thereof.
diff --git a/README.md b/README.md
@@ -0,0  1,98 @@
 # Motion Representations for Articulated Animation
 
 This repository contains the source code for the CVPR'2021 paper [Motion Representations for Articulated Animation](https://arxiv.org/abs/2104.11280) by [Aliaksandr Siarohin](https://aliaksandrsiarohin.github.io/aliaksandr-siarohin-website/), [Oliver  Woodford](https://ojwoodford.github.io/), [Jian Ren](https://alanspike.github.io/), [Menglei Chai](https://mlchai.com/) and [Sergey Tulyakov](http://www.stulyakov.com/). 
 
 For more qualitiative examples visit our [project page](https://snap-research.github.io/articulated-animation/).
 
 ## Example animation
 
 Here is an example of several images produced by our method. In the first column the driving video is shown. For the remaining columns the top image is animated by using motions extracted from the driving. 
 
 ![Screenshot](sup-mat/teaser.gif)
 
 ### Installation
 
 We support ```python3```. To install the dependencies run:
 ```bash
 pip install -r requirements.txt
 ```
 
 ### YAML configs
 
 There are several configuration files one for each `dataset` in the `config` folder named as ```config/dataset_name.yaml```. See ```config/dataset.yaml``` to get the description of each parameter.
 
 See description of the parameters in the ```config/vox256.yaml```. We adjust the the configuration to run on 1 V100 GPU, training on 256x256 dataset takes approximatly 2 days.
 
 ### Pre-trained checkpoints
 Checkpoints can be found in ```checkpoints``` folder. Checkpoints are large, therefore we use [git lsf](https://git-lfs.github.com/) to store them. Either use ```git lfs pull``` or download checkpoints manually from github.
 
 ### Animation Demo
 To run a demo, download a checkpoint and run the following command:
 ```bash
 python demo.py  --config config/dataset_name.yaml --driving_video path/to/driving --source_image path/to/source --checkpoint path/to/checkpoint
 ```
 The result will be stored in ```result.mp4```. To use Animation via Disentaglemet add ```--mode avd```, for standard animation add  ```--mode standard``` instead.
 
 ### Colab Demo 
 We prepared a demo runnable in google-colab, see: ```demo.ipynb```.
 
 
 ### Training
 
 To train a model run:
 ```bash
 CUDA_VISIBLE_DEVICES=0 python run.py --config config/dataset_name.yaml --device_ids 0
 ```
 The code will create a folder in the log directory (each run will create a time-stamped new folder). Checkpoints will be saved to this folder.
 To check the loss values during training see ```log.txt```.
 You can also check training data reconstructions in the ```train-vis``` subfolder.
 Then to train **Animation via disentaglement (AVD)** use:
 
 ```bash
 CUDA_VISIBLE_DEVICES=0 python run.py --checkpoint log/{folder}/cpk.pth --config config/dataset_name.yaml --device_ids 0 --mode train_avd
 ```
 Where ```{folder}``` is the name of the folder created in the previous step. (Note: use backslash '\' before space.)
 This will use the same folder where checkpoint was previously stored.
 It will create a new checkpoint containing all the previous models and the trained avd_network.
 You can monitor performance in log file and visualizations in train-vis folder.
 
 ### Evaluation on video reconstruction
 
 To evaluate the reconstruction performance run:
 ```bash
 CUDA_VISIBLE_DEVICES=0 python run.py --config config/dataset_name.yaml --mode reconstruction --checkpoint log/{folder}/cpk.pth
 ```
 Where ```{folder}``` is the name of the folder created in the previous step. (Note: use backslash '\' before space.)
 The ```reconstruction``` subfolder will be created in the checkpoint folder.
 The generated video will be stored to this folder, also generated videos will be stored in ```png``` subfolder in loss-less '.png' format for evaluation.
 Instructions for computing metrics from the paper can be found [here](https://github.com/AliaksandrSiarohin/pose-evaluation).
 
 ### TED dataset
 For obtaining TED dataset run the following commands:
 ```bash
 git clone https://github.com/AliaksandrSiarohin/video-preprocessing
 cd video-preprocessing
 python load_videos.py --metadata ../data/ted384-metadata.csv --format .mp4 --out_folder ../data/TED384-v2 --workers 8 --image_shape 384,384
 ```
 
 ### Training on your own dataset
 1) Resize all the videos to the same size, e.g 256x256, the videos can be in '.gif', '.mp4' or folder with images.
 We recommend the latter, for each video make a separate folder with all the frames in '.png' format. This format is loss-less, and it has better i/o performance.
 
 2) Create a folder ```data/dataset_name``` with 2 subfolders ```train``` and ```test```, put training videos in the ```train``` and testing in the ```test```.
 
 3) Create a config file ```config/dataset_name.yaml```. See description of the parameters in the ```config/vox256.yaml```.  Specify the dataset root in dataset_params specify by setting  ```root_dir:  data/dataset_name```.  Adjust other parameters as desired, such as the number of epochs for example. Specify ```id_sampling: False``` if you do not want to use id_sampling.
 
 
 #### Additional notes
 
 Citation: 
 ```
 @inproceedings{siarohin2021motion,
         author={Siarohin, Aliaksandr and Woodford, Oliver and Ren, Jian and Chai, Menglei and Tulyakov, Sergey},
         title={Motion Representations for Articulated Animation},
         booktitle = {CVPR},
         year = {2021}
 }
 ```
 
diff --git a/animate.py b/animate.py
@@ -0,0  1,104 @@
 """
 Copyright Snap Inc. 2021. This sample code is made available by Snap Inc. for informational purposes only.
 No license, whether implied or otherwise, is granted in or to such code (including any rights to copy, modify,
 publish, distribute and/or commercialize such code), unless you have entered into a separate agreement for such rights.
 Such code is provided as-is, without warranty of any kind, express or implied, including any warranties of merchantability,
 title, fitness for a particular purpose, non-infringement, or that such code is free of defects, errors or viruses.
 In no event will Snap Inc. be liable for any damages or losses of any kind arising from the sample code or your use thereof.
 """
 
 import os
 from tqdm import tqdm
 
 import torch
 from torch.utils.data import DataLoader
 
 from frames_dataset import PairedDataset
 from logger import Logger, Visualizer
 import imageio
 from scipy.spatial import ConvexHull
 import numpy as np
 
 from sync_batchnorm import DataParallelWithCallback
 
 
 def get_animation_region_params(source_region_params, driving_region_params, driving_region_params_initial,
                                 mode='standard', avd_network=None, adapt_movement_scale=True):
     assert mode in ['standard', 'relative', 'avd']
     new_region_params = {k: v for k, v in driving_region_params.items()}
     if mode == 'standard':
         return new_region_params
     elif mode == 'relative':
         source_area = ConvexHull(source_region_params['shift'][0].data.cpu().numpy()).volume
         driving_area = ConvexHull(driving_region_params_initial['shift'][0].data.cpu().numpy()).volume
         movement_scale = np.sqrt(source_area) / np.sqrt(driving_area)
 
         shift_diff = (driving_region_params['shift'] - driving_region_params_initial['shift'])
         shift_diff *= movement_scale
         new_region_params['shift'] = shift_diff   source_region_params['shift']
 
         affine_diff = torch.matmul(driving_region_params['affine'],
                                    torch.inverse(driving_region_params_initial['affine']))
         new_region_params['affine'] = torch.matmul(affine_diff, source_region_params['affine'])
         return new_region_params
     elif mode == 'avd':
         new_region_params = avd_network(source_region_params, driving_region_params)
         return new_region_params
 
 
 def animate(config, generator, region_predictor, avd_network, checkpoint, log_dir, dataset):
     animate_params = config['animate_params']
     log_dir = os.path.join(log_dir, 'animation')
 
     dataset = PairedDataset(initial_dataset=dataset, number_of_pairs=animate_params['num_pairs'])
     dataloader = DataLoader(dataset, batch_size=1, shuffle=False, num_workers=1)
 
     if checkpoint is not None:
         Logger.load_cpk(checkpoint, generator=generator, region_predictor=region_predictor,
                         avd_network=avd_network)
     else:
         raise AttributeError("Checkpoint should be specified for mode='animate'.")
 
     if not os.path.exists(log_dir):
         os.makedirs(log_dir)
 
     if torch.cuda.is_available():
         generator = DataParallelWithCallback(generator)
         region_predictor = DataParallelWithCallback(region_predictor)
         avd_network = DataParallelWithCallback(avd_network)
 
     generator.eval()
     region_predictor.eval()
     avd_network.eval()
 
     for it, x in tqdm(enumerate(dataloader)):
         with torch.no_grad():
             visualizations = []
 
             driving_video = x['driving_video']
             source_frame = x['source_video'][:, :, 0, :, :]
 
             source_region_params = region_predictor(source_frame)
             driving_region_params_initial = region_predictor(driving_video[:, :, 0])
 
             for frame_idx in range(driving_video.shape[2]):
                 driving_frame = driving_video[:, :, frame_idx]
                 driving_region_params = region_predictor(driving_frame)
                 new_region_params = get_animation_region_params(source_region_params, driving_region_params,
                                                                 driving_region_params_initial,
                                                                 mode=animate_params['mode'],
                                                                 avd_network=avd_network)
                 out = generator(source_frame, source_region_params=source_region_params,
                                 driving_region_params=new_region_params)
 
                 out['driving_region_params'] = driving_region_params
                 out['source_region_params'] = source_region_params
                 out['new_region_params'] = new_region_params
 
                 visualization = Visualizer(**config['visualizer_params']).visualize(source=source_frame,
                                                                                     driving=driving_frame, out=out)
                 visualizations.append(visualization)
 
             result_name = "-".join([x['driving_name'][0], x['source_name'][0]])
             image_name = result_name   animate_params['format']
             imageio.mimsave(os.path.join(log_dir, image_name), visualizations)