Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas1.0.1 has trouble with certain column names #32409

Closed
jeremy-rutman opened this issue Mar 3, 2020 · 11 comments · Fixed by #38403
Closed

pandas1.0.1 has trouble with certain column names #32409

jeremy-rutman opened this issue Mar 3, 2020 · 11 comments · Fixed by #38403
Assignees
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@jeremy-rutman
Copy link

jeremy-rutman commented Mar 3, 2020

Code Sample, a copy-pastable example if possible

df = pd.read_json(file_path,lines=True)

TypeError: unhashable type: ‘dict’ .  `

Problem description

Pandas1.0.1 with python3.7 hits the above for json containing dicts containing the key "last_status_change_at" and possibly others, while pandas0.25.3 with python3.7 does not. I haven't checked yet with python3.6.

Expected Output

no unhashableness

Output of pd.show_versions()

INSTALLED VERSIONS (pandas 1.0.1 which hits error)

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.12.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

INSTALLED VERSIONS (pandas0.25.3 which doesnt hit error)

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.25.3
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.12.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

@TomAugspurger
Copy link
Contributor

Do you have a reproducible example? http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

@TomAugspurger TomAugspurger added the Needs Info Clarification about behavior needed to assess issue label Mar 3, 2020
@jeremy-rutman
Copy link
Author

jeremy-rutman commented Mar 3, 2020

This gives the same unhashable-type error as above in pd 1.0.1 , but not in pd 0.25.3:

df = pd.DataFrame([{"last_status_change_at":{"a":"1"}},{"last_status_change_at":{"a":"2"}}])
df.describe()

It seems to be related to that particular column name (last_status_change_at); if I change the column name things go back to normal. I thought maybe this clashes with a reserved column name but cannot find any reference to last_status_change_at anywhere in the pandas-dev repo.

@jeremy-rutman jeremy-rutman changed the title pandas1.0.1 pandas1.0.1 has trouble with certain column names Mar 9, 2020
@TomAugspurger
Copy link
Contributor

@jeremy-rutman can you try your last example on master? I don't see any issues there.

@jeremy-rutman
Copy link
Author

When I checked just now I get the following on both 0.25.3 and 1.0.1 versions of pandas, both versions being installed by pycharm (thus using pip3 install pandas and using master or whatever branch one gets when doing such).

import pandas as pd
print(pd.__version__)
df = pd.DataFrame([{"test": 1}, {"test": 2}])
print(df.describe())
df = pd.DataFrame([{"test":{"a":"1"}},{"test":{"a":"2"}}])
df.describe()

/usr/bin/python3.6 /home/jeremy/.PyCharmCE2019.3/config/scratches/scratch.py
1.0.1
           test
count  2.000000
mean   1.500000
std    0.707107
min    1.000000
25%    1.250000
50%    1.500000
75%    1.750000
max    2.000000
Traceback (most recent call last):
  File "/home/jeremy/.PyCharmCE2019.3/config/scratches/scratch.py", line 6, in <module>
    df.describe()
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py", line 9940, in describe
    ldesc = [describe_1d(s) for _, s in data.items()]
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py", line 9940, in <listcomp>
    ldesc = [describe_1d(s) for _, s in data.items()]
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py", line 9923, in describe_1d
    return describe_categorical_1d(data)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py", line 9879, in describe_categorical_1d
    objcounts = data.value_counts()
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/base.py", line 1244, in value_counts
    dropna=dropna,
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/algorithms.py", line 723, in value_counts
    keys, counts = _value_counts_arraylike(values, dropna)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/algorithms.py", line 767, in _value_counts_arraylike
    keys, counts = f(values, dropna)
  File "pandas/_libs/hashtable_func_helper.pxi", line 336, in pandas._libs.hashtable.value_count_object
  File "pandas/_libs/hashtable_func_helper.pxi", line 347, in pandas._libs.hashtable.value_count_object
TypeError: unhashable type: 'dict'

Process finished with exit code 1

@simonjayhawkins
Copy link
Member

not seeing this issue on 0.25.3 or master, but on 1.0.1

>>> import numpy as np
>>> import pandas as pd
>>>
>>> print(pd.__version__)
1.0.1
>>>
>>> df = pd.DataFrame([{"test": {"a": "1"}}, {"test": {"a": "2"}}])
>>> print(df)
         test
0  {'a': '1'}
1  {'a': '2'}
>>> print(df.describe())
Traceback (most recent call last):
  File "pandas\_libs\hashtable_class_helper.pxi", line 1652, in pandas._libs.hashtable.PyObjectHashTable.map_locations
    hash(val)
TypeError: unhashable type: 'dict'
Exception ignored in: 'pandas._libs.index.IndexEngine._call_map_locations'
Traceback (most recent call last):
  File "pandas\_libs\hashtable_class_helper.pxi", line 1652, in pandas._libs.hashtable.PyObjectHashTable.map_locations
    hash(val)
TypeError: unhashable type: 'dict'
              test
count            2
unique           2
top     {'a': '1'}
freq             1

@simonjayhawkins simonjayhawkins removed the Needs Info Clarification about behavior needed to assess issue label Mar 31, 2020
@simonjayhawkins
Copy link
Member

not seeing this issue on 0.25.3 or master, but on 1.0.1

fixed in #31294

f1d7ac6 is the first new commit
commit f1d7ac6
Author: jbrockmendel [email protected]
Date: Sat Jan 25 16:34:32 2020 -0800

PERF: optimize is_scalar, is_iterator (#31294)

@simonjayhawkins
Copy link
Member

not seeing this issue on 0.25.3 or master, but on 1.0.1

this issue started to appear with #29232, so marking as regression.

26cd577 is the first bad commit
commit 26cd577
Author: jbrockmendel [email protected]
Date: Tue Oct 29 09:06:49 2019 -0700

CLN: assorted cleanups (#29232)

@simonjayhawkins simonjayhawkins added the Regression Functionality that used to work in a prior pandas version label Mar 31, 2020
@mroeschke
Copy link
Member

Looks to work again on master. Could use a test

In [3]: >>> df = pd.DataFrame([{"test": {"a": "1"}}, {"test": {"a": "2"}}])
   ...: >>> print(df)
         test
0  {'a': '1'}
1  {'a': '2'}

In [4]: print(df.describe())
              test
count            2
unique           2
top     {'a': '1'}
freq             1

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Regression Functionality that used to work in a prior pandas version labels May 11, 2020
@rivera-fernando
Copy link

Hey, new to open source here. If I understand correctly, what's needed here is a test since it is already working on master. I can give it a shot!

@rivera-fernando
Copy link

take

rivera-fernando pushed a commit to rivera-fernando/pandas that referenced this issue May 14, 2020
TST
Added test for issue pandas-dev#32409, json containing dicts containing the key last_status_change_at
@luckyvs1
Copy link
Contributor

@rivera-fernando are you still working on this issue?

@jreback jreback modified the milestones: 1.1, Contributions Welcome Jul 14, 2020
@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Nov 26, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.3 Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment