Skip to content
View ArthurConmy's full-sized avatar
🏃‍♂️
🏃‍♂️

Block or report ArthurConmy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Entropy Based Sampling and Parallel CoT Decoding

TypeScript 924 137 Updated Oct 7, 2024

Internalizing steering vectors via fine tuning

Jupyter Notebook 3 Updated Sep 6, 2024

Code to enable layer-level steering in LLMs using sparse auto encoders

Python 1 Updated Sep 20, 2024

we got you bro

32 Updated Jul 29, 2024

Training Sparse Autoencoders on Language Models

Jupyter Notebook 396 108 Updated Oct 3, 2024

PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)

Python 12 1 Updated Apr 16, 2024

A library for efficient patching and automatic circuit discovery.

Python 23 9 Updated Aug 24, 2024

Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).

HTML 140 27 Updated Oct 5, 2024

Experiments with representation engineering

Python 9 1 Updated Feb 28, 2024

My interpretation of what einops indexing would look like (created to work on during my SERI MATS project).

Python 5 1 Updated Jul 7, 2024

Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions

Python 610 59 Updated Oct 4, 2024
Python 57 2 Updated Mar 8, 2022

(Model-written) LLM evals library

Python 15 2 Updated Jul 27, 2024

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*

Python 77 17 Updated Dec 14, 2023

Fork of Arthur Conmy's Automatic-Circuit-Discovery for the purpose of conducting ACDC research

Jupyter Notebook 1 Updated Feb 11, 2024

Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"

Jupyter Notebook 22 7 Updated May 31, 2024

Repo for hosting Streamlit pages for my 2023 SERI MATS project with Arthur Conmy (mentored by Neel Nanda).

HTML 7 1 Updated Feb 27, 2024

Type annotations and runtime checking for shape and dtype of JAX/NumPy/PyTorch/etc. arrays. https://docs.kidger.site/jaxtyping/

Python 1,137 59 Updated Sep 1, 2024
Python 5 2 Updated Aug 24, 2023

Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.

HTML 189 77 Updated Feb 7, 2024

Mechanistic Interpretability Visualizations using React

Jupyter Notebook 185 29 Updated Jul 13, 2024

Lucid library adapted for PyTorch with new features for ViTs and MLP-Mixers

Python 1 Updated Aug 29, 2022
Python 2,648 301 Updated Oct 6, 2024

A library for mechanistic interpretability of GPT-style language models

Python 1,455 285 Updated Oct 4, 2024

An autoregressive character-level language model for making more things

Python 2,505 658 Updated Jun 4, 2024

The Happy Faces Benchmark

Python 14 Updated Jul 20, 2023
Next