Code for paper Learning from Massive Human Videos for Universal Humanoid Pose Control. Please refer to our project page for more demonstrations and up-to-date related resources.
To establish the environment, run this code in the shell:
conda create -n UH-1 python=3.8.11
conda activate UH-1
pip install git+https://github.com/openai/CLIP.git
pip install mujoco opencv-python
Download our text-to-keypoint model checkpoints from here.
git lfs install
git clone https://huggingface.co/USC-GVL/UH-1
For text-to-keypoint generation,
-
Change the
root_path
ininference.py
to the path of the checkpoints you just downloaded. -
Change the
prompt_list
ininference.py
to the language prompt you what the model to generate. -
Run the following commands, and the generated humanoid motion will be stored in the
output
folder.
python inference.py
The generated keypoint is in this shape: [number of frames, 34-dim keypoint]
, where the 34-dim keypoint = 27-dim DoFs joint pose value + 3-dim root position + 4-dim root orientation
.
Visualize these keypoints by directly setting DoFs pose,
- Change the
file_list
invisualize.py
to the generated humaoid motion file names. - Run the following commands, and the rendered video will be stored in the
output
folder.
mjpython visualize.py
If you want to do close-loop control conditioned on the generated humanoid keypoints, you need to use the goal-conditioned humanoid control policy provided below.
To set up the conda environment for Isaac Gym while avoiding dependency conflicts, we chose to create a new environment.
conda create -n UH-1-rl python=3.8
conda activate UH-1-rl
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1+cu117 -f https://download.pytorch.org/whl/torch_stable.html
pip install oauthlib==3.2.2 protobuf==5.28.1
# Download the Isaac Gym binaries from https://developer.nvidia.com/isaac-gym
cd isaacgym/python && pip install -e .
# then make sure you are at the root folder of this project
cd rsl_rl && pip install -e .
cd ../legged_gym && pip install -e .
pip install "torch==1.13.1" "numpy==1.23.0" pydelatin==0.2.8 wandb==0.17.5 tqdm opencv-python==4.10.0.84 ipdb pyfqmr==0.2.1 flask dill==0.3.8 gdown==5.2.0 pytorch_kinematics==0.7.4 easydict==1.13
Here is a sample of our training data. Due to the file size limit of Github, the data file can be downloaded here.
Please put the data file at motion_lib/motion_pkl/motion_data_cmu_sample.pkl
To play the policy with the checkpoint we"ve provided, try
# make sure you are at the root folder of this project
cd legged_gym/legged_gym/scripts
python play.py 000-00 --task h1_2_mimic --device cuda:0
To train the goal-conditioned RL policy from scratch, try
# make sure you are at the root folder of this project
cd legged_gym/legged_gym/scripts
python train.py xxx-xx-run_name --task h1_2_mimic --device cuda:0
For the data collection pipeline, including Video Clip Extraction, 3D Human Pose Estimation, Video Captioning, and Motion Retargetting, please refer to this README.
If you find our work helpful, please cite us:
@article{mao2024learning,
title={Learning from Massive Human Videos for Universal Humanoid Pose Control},
author={Mao, Jiageng and Zhao, Siheng and Song, Siqi and Shi, Tianheng and Ye, Junjie and Zhang, Mingtong and Geng, Haoran and Malik, Jitendra and Guizilini, Vitor and Wang, Yue},
journal={arXiv preprint arXiv:2412.14172},
year={2024}
}