-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Fix tf lstm #3419
base: master
Are you sure you want to change the base?
WIP: Fix tf lstm #3419
Conversation
…s with conversion of tensors
for more information, see https://pre-commit.ci
@connortann: I would like to ask for your advice. So this is a pretty longstanding issue, and currently I am capable of evaluating shap values for LSTM, GRU and SimpleRNN layers as long as the neural net is linear (output of layer t is input of layer t 1). But there is a more general case where e.g. multiple inputs can be concatenated which I currently struggle with. I am confident to fix this problem in the upcoming month. But should I prepare a PR where the linear NNs are working and create a seperate one for nonlinear NNs? What is your thought on that? |
Thanks for taking on this issue, it would be great to get those issues addressed. In terms of how to arrange the PRs, I don't really have anything specific to say about this PR but only general advice. It's helpful to break PRs down into small incremental changes wherever possible. It's up to your judgement on how best to split any large PRs. Your emphasis on adding test cases looks very prudent - this seems like it will be critical, given that there is a huge spaec of possible models that DeepExplainer has to be able to explain. |
The errors seem mainly to happen in 2 tests and whether the shap values sum up as expected is heavily dependent on the input. Not sure if I should mark this tests as xfail for now and create a seperate issue. The thing here is that we have a lot of issues where people are complaining about the Deepexplainer outputs not summing up as expected (not passing |
Hi @CloseChoice @connortann I am very closely following this issue and I am happy to test the releases or fixes done Kindly let me know if you need any support from my side happy to contribute to fix this issue |
Thanks for the reply. Your feedback is most welcome. If you could test this branch on data and tell us whether you realise if the results make sense or if there are strong inconsistencies, that would be amazing. |
Hi @CloseChoice Sure, Happy to do so. Just double checking I am good If I run this setup: this experiment --> https://www.kaggle.com/code/billlucas/explaining-cnn-lstm-using-shap/notebook Is my understanding correct ? If there any specific versions I need to use Please point me to the url where can I refer these details |
Thanks a lot for helping on this. I really appreciate it. That looks good. Hope with this fix, we are capable to run any tensorflow version above 2.4 aswell, so feel free to use the latest version (our test suite will test against the latest either way). I did not test all layers that I can see in that examples, so am pretty curious about the results. |
hi @CloseChoice i am using it for some LSTM code, as i couldn't compute SHAP values. Now code is working, but I am getting the gradients as zero. Looking into it, I found that in function phi_symbolic (line 365) and then in function grad_graph (line 352), x_grad is zero. Now i see you are still calling self._init_between_tensors(out.op, shap_rAnD) instead of self._init_between_tensors_eager(out.op, shap_rAnD, data..?)? Maybe that is making gradient go to zero? |
i also did
to see if there is issue in network, but gradient is still zero, which means that gradients are going to zero |
Thanks for looking into this. I also found that all shap values are zero (due to the gradients being zero as you found out). The problem here is that tensorflow is hiding the exact gradients from us and just exposed a PartitionedCall to us but we need to get the exact ops that are called. I'll try to find a workaround for this. |
What about call to |
Nope, that won't help since the |
@CloseChoice Do we have a status update on this? Would really appreciate it because my model is using some tf2 features and require shap lstm explainer. |
Hey, thanks for having your eyes on this. I do not have worked on this for quite a while. The problems lay somewhere deep in tensorflow hiding their API from us. Can't give any timeframe on this. |
Overview
Closes #3344, #3343
Description of the changes proposed in this pull request:
The problem is that the tensorflow ops were not captured in eager mode (so
self._init_between_tensors
was not executed which resulted in an error causeself.in_between_ops
was not set). This is fixed by introducing_init_between_tensors_eager
method, that executes the graph once for the data input and captures all ops that are called. From there on the flow continues as before. Since tensorflow2 seems to introduce a couple more ops, these also need to be added to theop_handlers
.Checklist
I would suggest the following steps in the given order:
test_tf_keras_lstm2
might be an indication that one of the recently added ops shouldn't be passed through. Understand what is going on here and how we can do this better.Later:
Note: Feel free to review and also post some previously failing examples in here