Question about the cross attention #82

qiguming · 2023-12-21T02:44:18Z

if crossattn:
detach = torch.ones_like(key)
detach[:, :1, :] = detach[:, :1, :]0.
key = detachkey (1-detach)key.detach()
value = detachvalue (1-detach)*value.detach()

Why stop the gradient of the first key-value pair here?

nupurkmr9 · 2023-12-24T04:34:30Z

Hi,

Since the first start of the sentence token is always fixed, I noticed a small improvement when detaching it during the training. I guess this helps in better association between the "V* category" and the target image and thus improved generation on inference time prompt.

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the cross attention #82

Question about the cross attention #82

qiguming commented Dec 21, 2023

nupurkmr9 commented Dec 24, 2023

Question about the cross attention #82

Question about the cross attention #82

Comments

qiguming commented Dec 21, 2023

nupurkmr9 commented Dec 24, 2023