Questions about updating actor network parameters in matd3 algorithm #182
Replies: 1 comment
-
Hi there, thanks for the question! If we take a look at the pseudo-code for both MADDPG and MATD3, the actor is updated using the sampled policy gradient: This section of the code you are asking about essentially covers the highlighted equation. Consider i agents, when updating the parameters for actor network (i), we only want to calculate gradients with respect to the ith action, and therefore need to detach the graph for all actions except from action i. This way, only the associated computational graph is used to calculate gradients during backpropagation for actor i, meaning the relevant network weights are updated with the relevant loss. Hope this helps! |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
All reactions