Skip to content

Implementations of various RL and Deep RL algorithms in TensorFlow, PyTorch and Keras.

License

Notifications You must be signed in to change notification settings

EliorBenYosef/reinforcement-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RL and Deep-RL implementations

This is a modular implementation, meaning: you can plug-and-play almost any environment (in the corresponding file, within the same base folder) with any algorithm.

Table of contents:

Tabular RL Algorithms

Implemented Algorithms:

. Prediction Control
Monte Carlo (MC) MC policy evaluation MC non-exploring-starts control, off-policy MC control
Temporal Difference 0 (TD0) TD0 policy evaluation SARSA, Expected SARSA, Q Learning, Double Q Learning

Implemented Environments:

Discrete State Space Discretized State Space
Toy Text - FrozenLake, Taxi, Blackjack Classic Control - MountainCar, CartPole, Acrobot

Deep RL Algorithms

Most of the cases, you can select the desired library type (lib_type) implementation: LIBRARY_TF, LIBRARY_TORCH, LIBRARY_KERAS.

Implemented Control Algorithms:

Implemented Environments:

(environments with Continuous State Space)

. Discrete Action Space Continuous Action Space
Observation Vector Input Type CartPole, LunarLander Pendulum, MountainCarContinuous, LunarLanderContinuous, BipedalWalker
Stacked Frames Input Type Breakout, SpaceInvaders

Algorithms restrictions

Note that some some algorithms have restrictions.

  • Innate restrictions:
Discrete Action Space Continuous Action Space
Deep Q Learning Deep Deterministic Policy Gradient
  • Some current restrictions are due to the fact that there's more work to be done (code-wise), meaning: writing for every -
    • library implementation (tensorflow, torch, keras).
    • input (state) type (observation vector, stacked frames).
    • action space type (discrete, continuous).

Enables playing from the command-line.

Running this file performs the algorithm on a single environment, through the command-line (using the argparse module to parse command-line options). The major benefit from this is that it enables concatenating multiple independent runs via && (so you can run multiple tests in one go).

Enables performing grid search.

Running this file performs a comparative grid search for a single environment, and plots the results. This is mostly done for hyper-parameters tuning. Note that currently I added 16 colors (more than that will raise an error, so add more colors if you need more than 16 combinations)

Currently, grid search is tuned to DQL, but it's applicable to every algorithm with only minor changes (the relevant imports are there at the top of the file, just commented out).

Results

Algorithms Performance Examples. Training & Test results come in the forms of graphs and statistics (for some of the environments) of both: running average of episode scores, and accumulated scores.

Tabular RL Algorithms

AI agent before and after training

Mountain Car

Cart Pole

Acrobot

environment_test() (in test_tabular_rl.py) result graphs

Deep RL Algorithms

Performance of DQL Grid Search on first & second FC layers' sizes (number of nodes \ neurons):

How to use

The test files contain examples of how to use: