Questions on output_hidden_states to get activations #11

HPLQAQ · 2024-07-15T03:53:03Z

Thank you for the wonderful repo and the opensource on llama3, it's clear and easy to read.
I think the common practise of training a SAE written in OpenAI/Anthophic is on the residual streams, or to say the outputs of the MLPs.
The repo is getting the activations through output_hidden_states=True provided in transformers. But I see that hidden_states got with this factor are the inputs of the transformers blocks.
I do not current know whether the residual streams or the hidden_states give better results for analysis. Or I was understanding the code in a wrong way. Can you help?

Thank you very much.

For reference
https://github.com/huggingface/transformers/blob/a5c642fe7a1f25d3bdcd76991443ba6ff7ee34b2/src/transformers/models/llama/modeling_llama.py#L859

The text was updated successfully, but these errors were encountered:

norabelrose · 2024-07-15T21:32:36Z

I see that hidden_states got with this factor are the inputs of the transformers blocks

The only difference between the two is that hidden_states includes the input embeddings while the "residual stream" as you are defining it does not. I'm working on an update to the repo which will enable more customization of hookpoints.

norabelrose closed this as completed Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions on output_hidden_states to get activations #11

Questions on output_hidden_states to get activations #11

HPLQAQ commented Jul 15, 2024 •

edited

Loading

norabelrose commented Jul 15, 2024

Questions on output_hidden_states to get activations #11

Questions on output_hidden_states to get activations #11

Comments

HPLQAQ commented Jul 15, 2024 • edited Loading

norabelrose commented Jul 15, 2024

HPLQAQ commented Jul 15, 2024 •

edited

Loading