Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Policy out softmax with illegal moves #22

Open
apollo-time opened this issue Dec 11, 2017 · 4 comments
Open

Policy out softmax with illegal moves #22

apollo-time opened this issue Dec 11, 2017 · 4 comments

Comments

@apollo-time
Copy link

policy_out = Dense(8*8, kernel_regularizer=l2(mc.l2_reg), activation="softmax", name="policy_out")(x)

I see calculate policy softmax on the all moves contains illegal.
How can calculate softmax on the only legal moves, if set placeholder for legal moves?

@mokemokechicken
Copy link
Owner

I see calculate policy softmax on the all moves contains illegal.
How can calculate softmax on the only legal moves, if set placeholder for legal moves?

Does this answer understand the intent of the question?


For example,

  # no output for 'pass'
  policy_out = Dense(8*8, kernel_regularizer=l2(mc.l2_reg), activation="softmax", name="policy_out")(x)

  legal_mask = Input((8 * 8))  # (0: illegal, 1: legal)
  ...
  # no output for 'pass'
  x = Dense(8*8, kernel_regularizer=l2(mc.l2_reg))(x)
  x = Multiply()([x, legal_mask])
  policy_out = Activation("softmax", name="policy_out")(x)
  
  ...
  
  self.model = Model([in_x, legal_mask], [policy_out, value_out], name="reversi_model")

Input of legal_mask is required to be computed in all training data.

@apollo-time
Copy link
Author

Is not equal softmax((0, -0.5, 0.5))[1:2] and softmax((-0.5,0.5)) when legal_mask=(0,1,1)?

@mokemokechicken
Copy link
Owner

mokemokechicken commented Dec 13, 2017

@apollo-time

oh, I was careless.
How about like this?

  legal_mask = Input((8 * 8))  # (0: illegal, 1: legal)
  legal_mask_2 = Lambda(lambda x: (x-1)*1000000)(legal_mask)  # illegal -> -1000000, legal -> 0
  ...
  # no output for 'pass'
  x = Dense(8*8, kernel_regularizer=l2(mc.l2_reg))(x)
  x = Add()([x, legal_mask_2])
  policy_out = Activation("softmax", name="policy_out")(x)
  
  ...
  
  self.model = Model([in_x, legal_mask], [policy_out, value_out], name="reversi_model")
> softmax([-0.5, 0.5])
[ 0.26894142,  0.73105858]

> softmax([-0.5, 0.5, -1000000])
[ 0.26894142,  0.73105858,  0.        ]

@apollo-time
Copy link
Author

right, just it. thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants