Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
intervention
interpretability
mechanistic-interpretability
activation-intervention
activation-patching
-
Updated
Oct 8, 2024 - Python
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
Explainability of Deep Learning Models
Projet refait entièrement dans la v2 web
Add a description, image, and links to the intervention topic page so that developers can more easily learn about it.
To associate your repository with the intervention topic, visit your repo's landing page and select "manage topics."