The Geometry of Truth

This repository is associated to the paper The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets by Samuel Marks and Max Tegmark. See also our interactive dataexplorer.

(View this page on github.)

Set-up

Navigate to the location that you want to clone this repo to, clone and enter the repo, and install requirements.

git clone [email protected]:saprmarks/geometry-of-truth.git
cd geometry-of-truth
pip install -r requirements.txt

Before doing anything, you'll need to generate activations for the datasets. You should have your own LLaMA weights stored on the machine where you cloned this repo. Put the absolute path for the directory containing your LLaMA weights in the file config.ini; Huggingface repos are also supported.

Once that's done, you can generate the LLaMA activations for the datasets you'd like to work with with a command like

python generate_acts.py --model llama-2-13b --layers 8 10 12 --datasets cities neg_cities --device cuda:0

These activations will be stored in the acts directory. If you want to save activations for all layers, simply use --layers -1.

Files

This directory contains the following files:

dataexplorer.ipynb: for generating visualizations of the datasets. Code for reproducing figures in the text is included.
few_shot.py: for implementing the calibrated 5-shot baseline.
generalization.ipynb: for training probes on one dataset and checking generalization to another. Includes code for reproducing the generalization matrix in the text.
interventions.py: for reproducing the causal intervention experiments from the text.
probes.py: contains definitions of probe classes.
utils.py and visualization_utils.py: utilities for managing datasets and producing visualizations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Geometry of Truth

Set-up

Files

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
acts		acts
dataexplorer		dataexplorer
datasets		datasets
experimental_outputs		experimental_outputs
.gitignore		.gitignore
README.md		README.md
config.ini		config.ini
dataexplorer.ipynb		dataexplorer.ipynb
few_shot.py		few_shot.py
generalization.ipynb		generalization.ipynb
generate_acts.py		generate_acts.py
interventions.py		interventions.py
logprobs.py		logprobs.py
patching.ipynb		patching.ipynb
patching.py		patching.py
probes.py		probes.py
requirements.txt		requirements.txt
review-fig.png		review-fig.png
utils.py		utils.py
visualization_utils.py		visualization_utils.py

saprmarks/geometry-of-truth

Folders and files

Latest commit

History

Repository files navigation

The Geometry of Truth

Set-up

Files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages