Skip to content

Framework to support LTL task allocation to a team of cooperative robots acting under multiple objectives.

License

Notifications You must be signed in to change notification settings

tmrob2/ltl2teambot

Repository files navigation

Learning LTL Task Specifications for Multiagent Systems under Multiple Objectives

Issues Apache-2.0 License


LTL2TeamBot

Framework to support LTL task allocation to a team of cooperative robots acting under multiple objectives.
Explore the docs »

· Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Installation
  3. Usage
  4. Specifying Tasks
  5. Learning Policies for Agents
  6. Training
  7. Visualisation
  8. Data and Results
  9. Contact
  10. Acknowledgments

About The Project

This framework supports multiagent reinforcment learning in enormous or unknown environments. The key idea of this framework is using a shared parameter network for the learning phase and then each agent executes its own policy. This is the so-called Centralised Training Decentralised Execution (CTDE) paradigm. This framework implements deterministic task allocation to cooperative robot teams, and parameterises the task allocation space for agents, updating the allocation parameters using a policy gradient approach. Parameterising the task allocation space is a unique way of scalarising the the multi-objective problem specifically in terms of task allocation.

Installation

  1. Create a new anaconda environment.
  2. There is an environment.yml file in the root directory which can be installed with:
    conda env create -n myenv -f environment.yml
  3. Clone the environment.
  4. To do development run
    pip install . -e
  5. Troubleshooting step: if there are issues with Pytorch go to Pytorch Website and get the correct drivers to configure your GPU.

Environment

The example implemented in this research is a multiagent environment where agents interact together to learn how to complete a set of tasks. The environment is an augmented version of the teamgrid environment In this setting there are a number of challenges to overcome, including, how the agents learn to resolve conflicts such as moving to same square, or avoiding picking up the same object.

Environments are customised in the mappo/envs directory. For example a simple test environment with two agents and two tasks can be found in mappo/envs/so_small_ltl_env.py. The environments included are:

  1. so_small_ltl_env.py a 2 agent 2 task small grid environment.
  2. task${x}_agent2_simple.py 2 agents must complete $x$-number of tasks, i.e. interacting with different objects to complete a mission.

Specifying Tasks

In this framework tasks are specified using LTL trees. An example of an LTL parse tree for the formula $\psi ::=\neg x \ \mathtt{U} \ ( \ Y \ \lor \ Z \ )$ is given below:

         U
       /    \
    not      and
     |       /   \
     r      j    U
          /     \
        not      k
         |
         p

There are a number of representations, however, this framework uses prefix notation, e.g. for the LTL parse tree above $(\mathtt{U}, (\neg, r),(\land, j, (\mathtt{U}, (\neg, p), k)))$. This can be efficiently learned as strings by a Gated Recurrent Unit (GRU) or the parse trees themselves can be encoded and learned using Graph Attention Networks (GAT).

The prefix notations are injected directly into the environment in the mappo/envs directory.

Learning Policies for Agents

The main network used in for experiments in this framework is the AC_MA_MO_LTL_Model which can be imported from mappo/networks

from mappo.networks.mo_ma_ltlnet import AC_MA_MO_LTL_Model

The task learning network is an Actor-Critic network where the critic network outputs a tensor with shape (...,num_objectives).

The network architecture to learn teamgrid policies which address LTL task specifications is shown below. Notice that, different from independent agent architectures, a single shared network is used for all agents. An agent does not 'see' the observations of other agents but the network is trained on the observations of all agents. In this way the CTDE requirement is met.

Input: img, task, ltl
   |
   v
Image Convolution (64 filters)
   |
   v
Memory LSTM (if use_memory=True)         LSTM LTL Embedding
   |                                         |
   |                                         v
   |                                         LTL GRU
   |                                         |
   |                                         v
   -------------------------------------------
   v
Embedding: embedding
   |
   v
Composed Features: composed_x    ------------|                 
   |                                         |
   v                                         v
Actor Network: logits for actions            Critic Network: critic values

The algorithm for training utilises a novel multiagent multi-objective Proximal Policy Optimisation algorithm which uses mini-batch. In the multiagent version of PPO the idea is to share the parameters of the policy so that each agent directly learns from the trajectories of all other agents.

Training

Training the model occurs in mappo/eval/team_grid directory, for example, mappo/eval/team_grid/experiments.py.

The following steps are followed:

  1. Register the environment and initialise data recorder.
  2. Initialised input parameters.
  3. Construct an observation environment for each agent.
  4. Specify the device and call the model constructor.
  5. There are two sets of parameters to update: $\kappa$-task allocation, and $\theta$-policy.
  6. Initialise the PPO algorithm and parameters.
  7. While the frames is less than the number of total frames and the best score for each objective is less than some threshold, collect experiences for each agent and update $\kappa, \theta$.
    • The loss function is based on both parameters:

eqn

where the first term manages the cost of each agent $i$ performing tasks while the second term manages the probability of completion for each task $j$.
  1. Print the outputs of the training and save models based on high-performing outputs.

Visualisation

After training has completed the learned policies can be visualised using:

python mappo/eval/team_grid/dec_visualisation.py

An example of the simple training environment can be seen below.

Simple-Example

Data and Results

Results for experiments can be found in the data folder.

License

Distributed under the Apache-2.0 License. See LICENSE for more information.

(back to top)

Contact

Thomas Robinson - @tmrobai - Email

Project Link: https://github.com/tmrob2/ltl2teambot

(back to top)

About

Framework to support LTL task allocation to a team of cooperative robots acting under multiple objectives.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published