Difference between local shap values coming from global and from the "shap_values"-method #2101

rainergo · 2021-07-20T08:25:32Z

Hi,
I have a tree explanation object containing ".data", ".values" and ".base_values". If I extract the shapley values from this object for a single data row ("local" value), I get the shapley values for the 32 features of my single data row.

Now, if I run the method "shap_values" on the explainer object and pass to it the feature data for that particular row, I also get the shapley values for the same row as above.

In my understanding, the shapley values extracted from the explanation object and the shapley values from the "shap_values"-method run on the explainer object should be the same.

But they are not. Very often (not always), there is a difference for only ONE feature shapley value. In my case, the shapley values for 31 features are exactly the same (for the two approaches), but most often there is a difference in ONE feature shapley value. I tried this on different data sets ... .

Why is that? And is this a bug?

c56pony · 2021-08-11T01:29:23Z

I was not able to reproduce that error in my environment. Can you please tell me the code to reproduce it?
My code and output is as follows.

from sklearn.model_selection import train_test_split
import xgboost as xgb
import shap
import numpy as np

X, y = shap.datasets.adult()
train_x, valid_x, train_y, valid_y = train_test_split(X, y, test_size=0.25, random_state=7)
dtrain = xgb.DMatrix(train_x, label=train_y)
dvalid = xgb.DMatrix(valid_x, label=valid_y)

params = {
    "objective": "binary:logistic",
    "eval_metric": "auc",
    "eta": 0.7
}
model = xgb.train(params, dtrain, evals = [(dvalid, "valid")])

explainer = shap.TreeExplainer(model)
shap_values = explainer(X)
shap_values2 = explainer.shap_values(X)
np.allclose(shap_values.values, shap_values2)

outputs True

LasseVDHeydtQC · 2022-07-29T15:43:01Z

Having the same issue, the values are just different... Could it be that the two methods handle categorical dtype differently?

firobeid · 2022-08-17T19:04:45Z

You guys should index based on the second axis to get the Shap's for each feature alone:
shap_values.values[:,feature_index]
where feature_index is the index in the train set of the feature in the column.

github-actions · 2024-08-17T02:45:32Z

This issue has been inactive for two years, so it's been automatically marked as 'stale'.

We value your input! If this issue is still relevant, please leave a comment below. This will remove the 'stale' label and keep it open.

If there's no activity in the next 90 days the issue will be closed.

github-actions bot added the stale Indicates that there has been no recent activity on an issue label Aug 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference between local shap values coming from global and from the "shap_values"-method #2101

Difference between local shap values coming from global and from the "shap_values"-method #2101

rainergo commented Jul 20, 2021

c56pony commented Aug 11, 2021

LasseVDHeydtQC commented Jul 29, 2022

firobeid commented Aug 17, 2022

github-actions bot commented Aug 17, 2024

Difference between local shap values coming from global and from the "shap_values"-method #2101

Difference between local shap values coming from global and from the "shap_values"-method #2101

Comments

rainergo commented Jul 20, 2021

c56pony commented Aug 11, 2021

LasseVDHeydtQC commented Jul 29, 2022

firobeid commented Aug 17, 2022

github-actions bot commented Aug 17, 2024