Questions about updating actor network parameters in matd3 algorithm #182

saiyuhang123 · 2024-01-24T12:13:50Z

saiyuhang123
Jan 24, 2024

                if self.arch == "mlp":
                    if self.accelerator is not None:
                        with actor.no_sync():
                            action = actor(states[agent_id])
                    else:
                        action = actor(states[agent_id])
                    if not self.discrete_actions:
                        action = torch.where(
                            action > 0,
                            action * self.max_action[idx][0],
                            action * -self.min_action[idx][0],
                        )
                    detached_actions = copy.deepcopy(actions)
                   detached_actions[agent_id] = action
                    input_combined = torch.cat(
                        list(states.values())   list(detached_actions.values()), 1
                    )
                    if self.accelerator is not None:
                        with critic_1.no_sync():
                            actor_loss = -critic_1(input_combined).mean()
                    else:
                        actor_loss = -critic_1(input_combined).mean()```
**detached_actions[agent_id] = action ** I don't quite understand this code Why does it update the action in the current loop individually instead of updating all agent actions?

I asked claude and the answer he gave was: This ensures that the actions of all other agents except the current agent are detached (separated from the calculation graph) and will not affect the calculation graph and updates. I still don't quite understand

This code is located at line 689 of the agilerl library matd3 file

mikepratt1 · 2024-01-24T15:12:19Z

mikepratt1
Jan 24, 2024
Maintainer

Hi there, thanks for the question!

If we take a look at the pseudo-code for both MADDPG and MATD3, the actor is updated using the sampled policy gradient:

This section of the code you are asking about essentially covers the highlighted equation. Consider i agents, when updating the parameters for actor network (i), we only want to calculate gradients with respect to the ith action, and therefore need to detach the graph for all actions except from action i. This way, only the associated computational graph is used to calculate gradients during backpropagation for actor i, meaning the relevant network weights are updated with the relevant loss.

Hope this helps!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about updating actor network parameters in matd3 algorithm #182

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Questions about updating actor network parameters in matd3 algorithm #182

saiyuhang123 Jan 24, 2024

Replies: 1 comment

mikepratt1 Jan 24, 2024 Maintainer

saiyuhang123
Jan 24, 2024

mikepratt1
Jan 24, 2024
Maintainer