Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between local shap values coming from global and from the "shap_values"-method #2101

Open
rainergo opened this issue Jul 20, 2021 · 4 comments
Labels
stale Indicates that there has been no recent activity on an issue

Comments

@rainergo
Copy link

Hi,
I have a tree explanation object containing ".data", ".values" and ".base_values". If I extract the shapley values from this object for a single data row ("local" value), I get the shapley values for the 32 features of my single data row.

Now, if I run the method "shap_values" on the explainer object and pass to it the feature data for that particular row, I also get the shapley values for the same row as above.

In my understanding, the shapley values extracted from the explanation object and the shapley values from the "shap_values"-method run on the explainer object should be the same.

But they are not. Very often (not always), there is a difference for only ONE feature shapley value. In my case, the shapley values for 31 features are exactly the same (for the two approaches), but most often there is a difference in ONE feature shapley value. I tried this on different data sets ... .

Why is that? And is this a bug?

@c56pony
Copy link
Contributor

c56pony commented Aug 11, 2021

I was not able to reproduce that error in my environment. Can you please tell me the code to reproduce it?
My code and output is as follows.

from sklearn.model_selection import train_test_split
import xgboost as xgb
import shap
import numpy as np

X, y = shap.datasets.adult()
train_x, valid_x, train_y, valid_y = train_test_split(X, y, test_size=0.25, random_state=7)
dtrain = xgb.DMatrix(train_x, label=train_y)
dvalid = xgb.DMatrix(valid_x, label=valid_y)

params = {
    "objective": "binary:logistic",
    "eval_metric": "auc",
    "eta": 0.7
}
model = xgb.train(params, dtrain, evals = [(dvalid, "valid")])

explainer = shap.TreeExplainer(model)
shap_values = explainer(X)
shap_values2 = explainer.shap_values(X)
np.allclose(shap_values.values, shap_values2)

outputs True

@LasseVDHeydtQC
Copy link

Having the same issue, the values are just different... Could it be that the two methods handle categorical dtype differently?

@firobeid
Copy link

You guys should index based on the second axis to get the Shap's for each feature alone:
shap_values.values[:,feature_index]
where feature_index is the index in the train set of the feature in the column.

Copy link

This issue has been inactive for two years, so it's been automatically marked as 'stale'.

We value your input! If this issue is still relevant, please leave a comment below. This will remove the 'stale' label and keep it open.

If there's no activity in the next 90 days the issue will be closed.

@github-actions github-actions bot added the stale Indicates that there has been no recent activity on an issue label Aug 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Indicates that there has been no recent activity on an issue
Projects
None yet
Development

No branches or pull requests

4 participants