This is a modular implementation, meaning: you can plug-and-play almost any environment (in the corresponding file, within the same base folder) with any algorithm.
Table of contents:
Implemented Algorithms:
. | Prediction | Control |
---|---|---|
Monte Carlo (MC) | MC policy evaluation | MC non-exploring-starts control, off-policy MC control |
Temporal Difference 0 (TD0) | TD0 policy evaluation | SARSA, Expected SARSA, Q Learning, Double Q Learning |
Implemented Environments:
Discrete State Space | Discretized State Space |
---|---|
Toy Text - FrozenLake, Taxi, Blackjack | Classic Control - MountainCar, CartPole, Acrobot |
Most of the cases, you can select the desired library type (lib_type
) implementation:
LIBRARY_TF
, LIBRARY_TORCH
, LIBRARY_KERAS
.
Implemented Control Algorithms:
- Deep Q Learning (DQL)
- Policy Gradient (PG)
- set
ep_batch_num = 1
for the Monte-Carlo PG (REINFORCE) algorithm
- set
- Actor-Critic (AC)
- Deep Deterministic Policy Gradient (DDPG)
Implemented Environments:
(environments with Continuous State Space)
. | Discrete Action Space | Continuous Action Space |
---|---|---|
Observation Vector Input Type | CartPole, LunarLander | Pendulum, MountainCarContinuous, LunarLanderContinuous, BipedalWalker |
Stacked Frames Input Type | Breakout, SpaceInvaders |
Note that some some algorithms have restrictions.
- Innate restrictions:
Discrete Action Space | Continuous Action Space |
---|---|
Deep Q Learning | Deep Deterministic Policy Gradient |
- Some current restrictions are due to the fact that there's more work to be done (code-wise),
meaning: writing for every -
- library implementation (tensorflow, torch, keras).
- input (state) type (observation vector, stacked frames).
- action space type (discrete, continuous).
Enables playing from the command-line.
Running this file performs the algorithm on a single environment,
through the command-line (using the argparse
module to parse command-line options).
The major benefit from this is that it enables concatenating multiple independent runs via &&
(so you can run multiple tests in one go).
Enables performing grid search.
Running this file performs a comparative grid search for a single environment, and plots the results. This is mostly done for hyper-parameters tuning. Note that currently I added 16 colors (more than that will raise an error, so add more colors if you need more than 16 combinations)
Currently, grid search is tuned to DQL, but it's applicable to every algorithm with only minor changes (the relevant imports are there at the top of the file, just commented out).
Algorithms Performance Examples. Training & Test results come in the forms of graphs and statistics (for some of the environments) of both: running average of episode scores, and accumulated scores.
Mountain Car
Cart Pole
Acrobot
environment_test()
(in test_tabular_rl.py) result graphs
Performance of DQL Grid Search on first & second FC layers' sizes (number of nodes \ neurons):
The test files contain examples of how to use:
- Tabular RL Algorithms
- Deep RL Algorithms: