This project aims to provide clean implementations of imitation learning algorithms. Currently we have implementations of Behavioral Cloning, DAgger (with synthetic examples), Adversarial Inverse Reinforcement Learning, and Generative Adversarial Imitation Learning.
pip install imitation
git clone http://github.com/HumanCompatibleAI/imitation
cd imitation
pip install -e .
Follow instructions to install mujoco_py v1.5 here.
We provide several CLI scripts as a front-end to the algorithms implemented in imitation
. These use Sacred for configuration and replicability.
# Train PPO agent on cartpole and collect expert demonstrations. Tensorboard logs saved in `quickstart/rl/`
python -m imitation.scripts.expert_demos with fast cartpole log_dir=quickstart/rl/
# Train GAIL from demonstrations. Tensorboard logs saved in output/ (default log directory).
python -m imitation.scripts.train_adversarial with fast gail cartpole rollout_path=quickstart/rl/rollouts/final.pkl
# Train AIRL from demonstrations. Tensorboard logs saved in output/ (default log directory).
python -m imitation.scripts.train_adversarial with fast airl cartpole rollout_path=quickstart/rl/rollouts/final.pkl
Tips:
- Remove the "fast" option from the commands above to allow training run to completion.
python -m imitation.scripts.expert_demos print_config
will list Sacred script options. These configuration options are documented in each script's docstrings.
For more information on how to configure Sacred CLI options, see the Sacred docs.
See examples/quickstart.py for an example script that loads CartPole-v1 demonstrations and trains BC, GAIL, and AIRL models on that data.
We also implement a density-based reward baseline. You can find an example notebook here.
@misc{wang2020imitation,
author = {Wang, Steven and Toyer, Sam and Gleave, Adam and Emmons, Scott},
title = {The {\tt imitation} Library for Imitation Learning and Inverse Reinforcement Learning},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/HumanCompatibleAI/imitation}},
}
See CONTRIBUTING.md.