Skip to content

Build and train a reinforcement learning (RL) model on AWS to autonomously drive JPL’s Open-Source Rover between given locations in a simulated Mars environment with the least amount of energy consumption and risk of damage.

License

Notifications You must be signed in to change notification settings

rishistyping/AWS_JPL_OSR_DRL

Repository files navigation

header image

TEAM MEMBERS: Edwin M. Liam Arbuckle Mikhail Asavkin Rishabh Chakrabarty

Welcome to the AWS-JPL Open Source Rover Challenge repository.

Here you will find everything you need to begin the challenge.

We have also begun to create several videos to help you get started.

Overview Video:

overview_video

The main sections of this document are:

  1. What is the challenge?

  2. What are the rules to the challenge

  3. Getting Started

  4. Asset manifest and descriptions

  5. Help and support

The AWS - JPL Open-Source Rover Challenge is an online, virtual, global competition to be held online starting on Monday, December 2, 2019 and ending on Friday, February 21, 2020. Sponsored by Amazon Web Services, Inc. (“The Sponsor” or “AWS”) and is held in collaboration with JPL and AngelHack LLC (“Administrator”).

Simply put - you must train an RL agent to successfully navigate the Rover to a predetermined checkpoint on Mars.

The below images show the NASA-JPL Open Source Rover (on the left) and your digital version of the Rover on the right

We have simplified the action space to three discrete options:

    Turn left
    Turn right
    Stay Straight

We have set the Rover to use a constant, even linear acceleration, in other words, you cannot make the Rover go faster or slower at this time. Wall Time is not a factor in the scoring mechanism.

The RL-agent leverages rl_coach to manage the training process. This repo ships with a Clipped PPO algorithm but you are free to use a different algorithm

osr

To win the challenge, your RL-agent must navigate the Rover to the checkpoint and have the HIGHEST SCORE

There is currently a single [simulated] Martian environment that all participants will use.

birdseye

The scoring algorithm calculates a score when the Rover reaches the destination point, without collisions, in single episode

    Begin with 10,000 basis points
    Subtract the number of (time) steps required to reach the checkpoint
    Subtract the distance travelled to reach the checkpoint (in meters)
    Subtract the Rover's average linear acceleration (measured in m/s^2)

The scoring mechanism is designed to reflect the highest score for the Rover that:

    Reaches the destination by means of the most optimized path (measured in time steps)
    Reaches the destination by means of the most optimized, shortest path (measured in distance traveled)
    Reaches the destination without experiencing unnecessary acceleration that could represent wheel hop or drops

While familiarity with RoS and Gazebo are not required for this challenge, they can be useful to understand how your RL-agent is controlling the Rover. You will be required to submit your entry in the form of an AWS Robomaker simulation job. This repo can be cloned directly into a (Cloud9) Robomaker development environment. It will not do a very good job training, however as the reward_function (more on that below) is empty.

All of the Martian world environment variables and Rover sensor data are captured for you and are then made available via global python variables. You must populate the method known as the "reward_function()". The challenge ships with examples of how to populate the reward function (found in the Training Grounds Gym environment). However, no level of accuracy or performance is guaranteed as the code is meant to give you a learning aid, not the solution.

If you wish to learn more about how the Rover interacts with it's environment, you can look at the "Training Grounds" world that also ships with this repo. It is a very basic world with monolith type structures that the Rover must learn to navigate around. You are free to edit this world as you wish to learn more about how the Rover manuevers.

DO NOT EDIT THE ROVER DESCRIPTION (src/rover) The Rover description that ships with this repo is the "gold standard" description and it will be the Rover used to score your entries to the competition.

DO NOT EDIT THE MARTIAN WORLD (src/mars) The Martian world that ships with this repo is the "gold standard" and it is the same one that will be used to score your entry

Project Structure: There are three primary components of the solution: header image

      src/rover/    A RoS package describing the Open Source Rover - this package is NOT editable
      src/mars/     A RoS/Gazebo package that describes and runs the simulated world
      src/rl-agent/ A Python3 module that contains a custom OpenAI Gym environment as well as wrapper code to initiate an rl_coach training session.  
    within this module is a dedicated function 

These three components work together to allow the Rover to navigate the Martian surface and send observation <-> reward tuples
back to the RL-agent which then uses a TensorFlow algorithm to learn how to optimize actions.

Custom Gym Environment: This is gym environment exists as a single python file in src -> rl-agent -> environments -> mars_env.py

mars_env.py is where you will create your reward function. There is already a class method for you called: def reward_function(self)

while you are free to add your own code to this method, you cannot change the signature of the method, or change the return types.

the method must return a boolean value indicating if the episode has ended (see more about episode ending events below) the method must also return a reward value for that time step.

If you believe they are warranted, you are free to add additional global variables in the environment. However, keep in mind if they are episodic values (values that should be reset after each episode) you will need to reset those values within the reward_function method once you have determined the episode should end.

Recommended Episode ending scenarios: There are several scenarios that should automatically end an episode. To end an episode simple set the "done" variable in the reward_function method to True.

  1. If the Rover collides with an object

collision

NOTE: If any part of the Rover other than the BOTTOM of the wheels comes into contact with a foreign object, it is considered a Collision. If an object comes into
contact with the SIDE of the wheel, it is still considered a collision.
  1. If the Rover's Power supply is drained This limit is currently set to 2,000 steps per episode

Creating your Robomaker Development Environment This GitHub repo should cloned to the root of your RoboMaker (Cloud9) development environment.

a.) Start by going to the Robomaker console and creating a new development environment.

robomaker

b.) Once your Cloud9 dev environment is launched, drop into a terminal window and delete everything in the root of the environment. (even the .c9 directory)

deletethis

C.) Clone this repo to the root of your Cloud9 dev environment by using the . (DOT) notation: git clone https://github.com/christopheraburns/AWS-JPL-OSR-Challenge . (note the . at the end of the git command - this prevents git from creating a local project directory)

There are several global constants and class scoped variables in the mars_env.py to help you build your Reward Function. These are known as "Episodic values" and will reset with each new episode. Do not remove any of these variables.

steps (integer)

The number of [time] steps associated with the current episode.  
This is an observation-action-reward process, not distance traveled

current_distance_to_checkpoint (float)

The number of meters to the checkpoint, from the Rover's current position  

closer_to_checkpoint (bool)

A boolean value to tell you if the Rover's last step took it closer (True) 
or further (False) from the Checkpoint

distance_travelled (float)

The total distance, in meters the Rover has traveled in the current Episode

collision_threshold (float)

The Rover is quipped with a LIDAR and will detect the distance of the closest 
object 45 degrees to the right or left, within 4.5 meters

last_collision_threshold (float)

the collision_threshold of the previous [time] step

x,y (float, float)

The current coordinate location of the Rover on the [simulated] Martian surface

last_position_x, last_position_y (float, float)

Coordinate location of the Rover in the previous time step

reward_in_episode (integer)

The cumulative reward for the current episode. As the participant determines 
the reward signal this number should not be compared to any other participants 
episodic reward

power_supply_range (integer)

This is the range of the Rover in a given episode.  It decrements by time steps, 
NOT distance traveled in order to prevent the Rover from getting stuck or 
if it flips due to fall/collision and cannot respond to commands

If you believe they are warranted, you are free to add additional global variables in the environment. However, keep in mind these are episodic values, or values that reset after each episode. You will need to write additional code to reset those values within the reward_function method once you have determined the episode should end.

RESOURCES AND REFERENCES

AWS Train a robot with reinforcement learning

AWS RoboMaker Dockerized Simulations

AWS RoboMaker - Beginner's Guide to Robot Simulation

How to Run DeepRacer Locally to Save your Wallet

Reinforcement Learning Coach by Intel AI Lab

AWS RoboMaker resources

Robomaker Setings template

Reinforcement Learning with Goal-Distance Gradient

Using Jupyter Notebook for analysing DeepRacer's logs

DeepRacer Log Analysis by Chris Thompson

PPO Hyperparameters and Ranges

Deep Reinforcement Learning

Autonomous car with Reinforcement Learning —Track Following

Easing functions

Maximum Entropy Policies in Reinforcement Learning & Everyday Life

Soft Actor-Critic

An intro to Advantage Actor Critic methods: let’s play Sonic the Hedgehog!

slack channel: awsjplroverchallenge.slack.com

About

Build and train a reinforcement learning (RL) model on AWS to autonomously drive JPL’s Open-Source Rover between given locations in a simulated Mars environment with the least amount of energy consumption and risk of damage.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published