Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

robust method won't work with verbose=True #530

Open
xiki-tempula opened this issue May 15, 2024 · 10 comments
Open

robust method won't work with verbose=True #530

xiki-tempula opened this issue May 15, 2024 · 10 comments
Labels

Comments

@xiki-tempula
Copy link

xiki-tempula commented May 15, 2024

from alchemtest.gmx import load_ABFE
from alchemlyb.parsing.gmx import extract_u_nk
import pandas as pd
import pymbar

df = pd.concat([extract_u_nk(data, 310) for data in load_ABFE()['data']['complex']])
u_nk = df.sort_index(level=df.index.names[1:])
groups = u_nk.groupby(level=u_nk.index.names[1:])
N_k = [
    (len(groups.get_group(i)) if i in groups.groups else 0)
    for i in u_nk.columns
 ]
mbar = pymbar.MBAR(
    u_nk.T,
    N_k,
    verbose=False, 
    solver_protocol="robust")

works but

mbar = pymbar.MBAR(
    u_nk.T,
    N_k,
    verbose=True, 
    solver_protocol="robust")

Gives

INFO:pymbar.mbar:K (total states) = 30, total samples = 30030
---------------------------------------------------------------------------
InvalidIndexError                         Traceback (most recent call last)
Cell In[48], line 1
----> 1 mbar = pymbar.MBAR(
      2             u_nk.T,
      3             N_k,verbose=True, solver_protocol="robust")

File ~/miniconda3/envs/ubar/lib/python3.10/site-packages/pymbar/mbar.py:296, in MBAR.__init__(self, u_kn, N_k, maximum_iterations, relative_tolerance, verbose, initial_f_k, solver_protocol, initialize, x_kindices, n_bootstraps, bootstrap_solver_protocol, rseed)
    294 for l in range(k):
    295     diffsum = 0
--> 296     uzero = u_kn[k, indices] - u_kn[l, indices]
    297     diffsum  = np.dot(uzero, uzero)
    298     if diffsum < relative_tolerance:

File ~/miniconda3/envs/ubar/lib/python3.10/site-packages/pandas/core/frame.py:3892, in DataFrame.__getitem__(self, key)
   3890 if is_single_key:
   3891     if self.columns.nlevels > 1:
-> 3892         return self._getitem_multilevel(key)
   3893     indexer = self.columns.get_loc(key)
   3894     if is_integer(indexer):

File ~/miniconda3/envs/ubar/lib/python3.10/site-packages/pandas/core/frame.py:3950, in DataFrame._getitem_multilevel(self, key)
   3948 def _getitem_multilevel(self, key):
   3949     # self.columns is a MultiIndex
-> 3950     loc = self.columns.get_loc(key)
   3951     if isinstance(loc, (slice, np.ndarray)):
   3952         new_columns = self.columns[loc]

File ~/miniconda3/envs/ubar/lib/python3.10/site-packages/pandas/core/indexes/multi.py:2908, in MultiIndex.get_loc(self, key)
   2867 def get_loc(self, key):
   2868     """
   2869     Get location for a label or a tuple of labels.
   2870 
   (...)
   2906     1
   2907     """
-> 2908     self._check_indexing_error(key)
   2910     def _maybe_to_slice(loc):
   2911         """convert integer indexer to boolean mask or slice if possible"""

File ~/miniconda3/envs/ubar/lib/python3.10/site-packages/pandas/core/indexes/multi.py:2628, in MultiIndex._check_indexing_error(self, key)
   2623 def _check_indexing_error(self, key) -> None:
   2624     if not is_hashable(key) or is_iterator(key):
   2625         # We allow tuples if they are hashable, whereas other Index
   2626         #  subclasses require scalar.
   2627         # We have to explicitly exclude generators, as these are hashable.
-> 2628         raise InvalidIndexError(key)

InvalidIndexError: (1, array([13625,  5804, 29848,  2864,  3890,  9757,  3711, 21736, 29484,
       19307, 22064,  4321, 20775, 14660, 19313, 19863, 10850,  6805,
        5670, 12244, 26369,  2992, 24401,  8630, 21046, 24756, 22339,
       27396, 14598, 29139, 28672,  1400, 27032, 18373, 19362, 10255,
        1001,  3949, 24404, 16882, 13914, 13274,  1800, 26756, 25353,
       14357,  7717, 26710, 26259, 29397]))
@mikemhenry mikemhenry added the bug label May 15, 2024
@mikemhenry
Copy link
Contributor

That is really interesting, can you share the data that reproduces this? Does it happen consistently?

@xiki-tempula
Copy link
Author

xiki-tempula commented May 15, 2024

The data comes from the package alchemtest, you can just pip install it.

conda create -n test -c conda-forge alchemlyb
conda activate test
pip install alchemtest

@mikemhenry
Copy link
Contributor

Thanks! (I wasn't playing enough attention and just assumed the pandas data frame came from disk or something)

@frannerin
Copy link

frannerin commented Aug 5, 2024

Just happened to me and digging a bit it's fixed changing the following line in mbar.py:

uzero = u_kn[k, indices] - u_kn[l, indices]

to

uzero = u_kn.iloc[k, indices] - u_kn.iloc[l, indices]

because k, l and indices are 0-based indices coming from the execution of the builtin range() and np.arange() and not values from the index or columns of u_kn.

@mrshirts
Copy link
Collaborator

mrshirts commented Aug 5, 2024

I'd be a little careful about changing lines in pymbar without checking effects on everything else. More likely to be something off with the way that alchemlyb is passing data into pymbar?

@mrshirts
Copy link
Collaborator

mrshirts commented Aug 5, 2024

Let me be more precise. pymbar doesn't used pandas, so if it's a pandas error, it has to do with what is being passed into pymbar, i.e. something weird with the way the pandas dataframe is being interpreted as a numpy array as pymbar, which is not really a pymbar error, since it's expecting a numpy array - it's an issue with the way pandas are being cast as numpy arrays.

@xiki-tempula
Copy link
Author

I see. That's easy to solve then I shall just convert this to a numpy array.

@xiki-tempula
Copy link
Author

But it is still weird why would this only happen when verbose=True

@frannerin
Copy link

That whole block is inside an if verbose=True clause

@xiki-tempula
Copy link
Author

I see. Then casting the input to numpy array should be good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants