Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on output_hidden_states to get activations #11

Closed
HPLQAQ opened this issue Jul 15, 2024 · 1 comment
Closed

Questions on output_hidden_states to get activations #11

HPLQAQ opened this issue Jul 15, 2024 · 1 comment

Comments

@HPLQAQ
Copy link

HPLQAQ commented Jul 15, 2024

Thank you for the wonderful repo and the opensource on llama3, it's clear and easy to read.
I think the common practise of training a SAE written in OpenAI/Anthophic is on the residual streams, or to say the outputs of the MLPs.
The repo is getting the activations through output_hidden_states=True provided in transformers. But I see that hidden_states got with this factor are the inputs of the transformers blocks.
I do not current know whether the residual streams or the hidden_states give better results for analysis. Or I was understanding the code in a wrong way. Can you help?

Thank you very much.

For reference
https://github.com/huggingface/transformers/blob/a5c642fe7a1f25d3bdcd76991443ba6ff7ee34b2/src/transformers/models/llama/modeling_llama.py#L859

@norabelrose
Copy link
Member

I see that hidden_states got with this factor are the inputs of the transformers blocks

The only difference between the two is that hidden_states includes the input embeddings while the "residual stream" as you are defining it does not. I'm working on an update to the repo which will enable more customization of hookpoints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants