pandas1.0.1 has trouble with certain column names #32409

jeremy-rutman · 2020-03-03T12:37:59Z

Code Sample, a copy-pastable example if possible

df = pd.read_json(file_path,lines=True)

TypeError: unhashable type: ‘dict’ .  `

Problem description

Pandas1.0.1 with python3.7 hits the above for json containing dicts containing the key "last_status_change_at" and possibly others, while pandas0.25.3 with python3.7 does not. I haven't checked yet with python3.6.

Expected Output

no unhashableness

Output of `pd.show_versions()`

INSTALLED VERSIONS (pandas 1.0.1 which hits error)

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.12.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

INSTALLED VERSIONS (pandas0.25.3 which doesnt hit error)

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.25.3
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.12.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2020-03-03T12:45:30Z

Do you have a reproducible example? http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

jeremy-rutman · 2020-03-03T17:46:16Z

This gives the same unhashable-type error as above in pd 1.0.1 , but not in pd 0.25.3:

df = pd.DataFrame([{"last_status_change_at":{"a":"1"}},{"last_status_change_at":{"a":"2"}}])
df.describe()

It seems to be related to that particular column name (last_status_change_at); if I change the column name things go back to normal. I thought maybe this clashes with a reserved column name but cannot find any reference to last_status_change_at anywhere in the pandas-dev repo.

TomAugspurger · 2020-03-11T14:04:33Z

@jeremy-rutman can you try your last example on master? I don't see any issues there.

jeremy-rutman · 2020-03-11T21:01:59Z

When I checked just now I get the following on both 0.25.3 and 1.0.1 versions of pandas, both versions being installed by pycharm (thus using pip3 install pandas and using master or whatever branch one gets when doing such).

import pandas as pd
print(pd.__version__)
df = pd.DataFrame([{"test": 1}, {"test": 2}])
print(df.describe())
df = pd.DataFrame([{"test":{"a":"1"}},{"test":{"a":"2"}}])
df.describe()

/usr/bin/python3.6 /home/jeremy/.PyCharmCE2019.3/config/scratches/scratch.py
1.0.1
           test
count  2.000000
mean   1.500000
std    0.707107
min    1.000000
25%    1.250000
50%    1.500000
75%    1.750000
max    2.000000
Traceback (most recent call last):
  File "/home/jeremy/.PyCharmCE2019.3/config/scratches/scratch.py", line 6, in <module>
    df.describe()
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py", line 9940, in describe
    ldesc = [describe_1d(s) for _, s in data.items()]
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py", line 9940, in <listcomp>
    ldesc = [describe_1d(s) for _, s in data.items()]
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py", line 9923, in describe_1d
    return describe_categorical_1d(data)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py", line 9879, in describe_categorical_1d
    objcounts = data.value_counts()
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/base.py", line 1244, in value_counts
    dropna=dropna,
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/algorithms.py", line 723, in value_counts
    keys, counts = _value_counts_arraylike(values, dropna)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/algorithms.py", line 767, in _value_counts_arraylike
    keys, counts = f(values, dropna)
  File "pandas/_libs/hashtable_func_helper.pxi", line 336, in pandas._libs.hashtable.value_count_object
  File "pandas/_libs/hashtable_func_helper.pxi", line 347, in pandas._libs.hashtable.value_count_object
TypeError: unhashable type: 'dict'

Process finished with exit code 1

simonjayhawkins · 2020-03-31T12:46:21Z

not seeing this issue on 0.25.3 or master, but on 1.0.1

>>> import numpy as np
>>> import pandas as pd
>>>
>>> print(pd.__version__)
1.0.1
>>>
>>> df = pd.DataFrame([{"test": {"a": "1"}}, {"test": {"a": "2"}}])
>>> print(df)
         test
0  {'a': '1'}
1  {'a': '2'}
>>> print(df.describe())
Traceback (most recent call last):
  File "pandas\_libs\hashtable_class_helper.pxi", line 1652, in pandas._libs.hashtable.PyObjectHashTable.map_locations
    hash(val)
TypeError: unhashable type: 'dict'
Exception ignored in: 'pandas._libs.index.IndexEngine._call_map_locations'
Traceback (most recent call last):
  File "pandas\_libs\hashtable_class_helper.pxi", line 1652, in pandas._libs.hashtable.PyObjectHashTable.map_locations
    hash(val)
TypeError: unhashable type: 'dict'
              test
count            2
unique           2
top     {'a': '1'}
freq             1

simonjayhawkins · 2020-03-31T13:28:31Z

not seeing this issue on 0.25.3 or master, but on 1.0.1

fixed in #31294

f1d7ac6 is the first new commit
commit f1d7ac6
Author: jbrockmendel [email protected]
Date: Sat Jan 25 16:34:32 2020 -0800

PERF: optimize is_scalar, is_iterator (#31294)

simonjayhawkins · 2020-03-31T14:35:30Z

not seeing this issue on 0.25.3 or master, but on 1.0.1

this issue started to appear with #29232, so marking as regression.

26cd577 is the first bad commit
commit 26cd577
Author: jbrockmendel [email protected]
Date: Tue Oct 29 09:06:49 2019 -0700

CLN: assorted cleanups (#29232)

mroeschke · 2020-05-11T01:20:37Z

Looks to work again on master. Could use a test

In [3]: >>> df = pd.DataFrame([{"test": {"a": "1"}}, {"test": {"a": "2"}}])
   ...: >>> print(df)
         test
0  {'a': '1'}
1  {'a': '2'}

In [4]: print(df.describe())
              test
count            2
unique           2
top     {'a': '1'}
freq             1

rivera-fernando · 2020-05-13T00:24:36Z

Hey, new to open source here. If I understand correctly, what's needed here is a test since it is already working on master. I can give it a shot!

rivera-fernando · 2020-05-13T00:24:42Z

take

Added test for issue pandas-dev#32409, json containing dicts containing the key last_status_change_at

luckyvs1 · 2020-07-12T08:47:29Z

@rivera-fernando are you still working on this issue?

…ev#32409) (pandas-dev#38403)

TomAugspurger added the Needs Info Clarification about behavior needed to assess issue label Mar 3, 2020

jeremy-rutman changed the title ~~pandas1.0.1~~ pandas1.0.1 has trouble with certain column names Mar 9, 2020

simonjayhawkins removed the Needs Info Clarification about behavior needed to assess issue label Mar 31, 2020

simonjayhawkins added the Regression Functionality that used to work in a prior pandas version label Mar 31, 2020

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Regression Functionality that used to work in a prior pandas version labels May 11, 2020

simonjayhawkins mentioned this issue May 11, 2020

REGR: exceptions not caught in _call_map_locations #34113

Merged

5 tasks

github-actions bot assigned rivera-fernando May 13, 2020

rivera-fernando pushed a commit to rivera-fernando/pandas that referenced this issue May 14, 2020

TST

8c4e28f

Added test for issue pandas-dev#32409, json containing dicts containing the key last_status_change_at

rivera-fernando mentioned this issue May 14, 2020

TST: Added test for json containing the key "last_status_change_at" #34169

Closed

5 tasks

luckyvs1 added a commit to luckyvs1/pandas that referenced this issue Jul 14, 2020

TST: Add test to ensure Dataframe describe does not throw an error (p…

d49fcd8

…andas-dev#32409)

luckyvs1 mentioned this issue Jul 14, 2020

TST: Add test to ensure DF describe does not throw an error (#32409) #35270

Closed

5 tasks

jreback modified the milestones: 1.1, Contributions Welcome Jul 14, 2020

simonjayhawkins mentioned this issue Sep 15, 2020

CI: Add stale PR action #36336

Merged

jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Nov 26, 2020

luckyvs1 added a commit to luckyvs1/pandas that referenced this issue Dec 10, 2020

TST: Add test to ensure Dataframe describe does not throw an error (p…

4be0ee8

…andas-dev#32409)

luckyvs1 added a commit to luckyvs1/pandas that referenced this issue Dec 10, 2020

TST: Add test to ensure Dataframe describe does not throw an error (p…

d4d42ba

…andas-dev#32409)

luckyvs1 mentioned this issue Dec 10, 2020

TST: Add test to ensure DF describe does not throw an error (#32409) #38403

Merged

5 tasks

luckyvs1 added a commit to luckyvs1/pandas that referenced this issue Dec 10, 2020

TST: Update linting from Black to CI required for spacing (pandas-dev…

10f2468

…#32409)

jreback modified the milestones: Contributions Welcome, 1.3 Jan 20, 2021

luckyvs1 added a commit to luckyvs1/pandas that referenced this issue Jan 20, 2021

TST: Add test to ensure Dataframe describe does not throw an error (p…

e727e4d

…andas-dev#32409)

luckyvs1 added a commit to luckyvs1/pandas that referenced this issue Jan 20, 2021

TST: Update linting from Black to CI required for spacing (pandas-dev…

cc2a650

…#32409)

luckyvs1 added a commit to luckyvs1/pandas that referenced this issue Jan 20, 2021

TST: Add test to ensure Dataframe describe does not throw an error (p…

4c66d90

…andas-dev#32409)

luckyvs1 added a commit to luckyvs1/pandas that referenced this issue Jan 20, 2021

TST: Update linting from Black to CI required for spacing (pandas-dev…

a22423d

…#32409)

jreback closed this as completed in #38403 Jan 20, 2021

jreback pushed a commit that referenced this issue Jan 20, 2021

TST: Add test to ensure DF describe does not throw an error (#32409) (#…

aab1744

…38403)

nofarm3 pushed a commit to nofarm3/pandas that referenced this issue Jan 21, 2021

TST: Add test to ensure DF describe does not throw an error (pandas-d…

62e499b

…ev#32409) (pandas-dev#38403)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandas1.0.1 has trouble with certain column names #32409

pandas1.0.1 has trouble with certain column names #32409

jeremy-rutman commented Mar 3, 2020 •

edited

Loading

INSTALLED VERSIONS (pandas 1.0.1 which hits error)

INSTALLED VERSIONS (pandas0.25.3 which doesnt hit error)

TomAugspurger commented Mar 3, 2020

jeremy-rutman commented Mar 3, 2020 •

edited

Loading

TomAugspurger commented Mar 11, 2020

jeremy-rutman commented Mar 11, 2020

simonjayhawkins commented Mar 31, 2020

simonjayhawkins commented Mar 31, 2020

simonjayhawkins commented Mar 31, 2020

mroeschke commented May 11, 2020

rivera-fernando commented May 13, 2020

rivera-fernando commented May 13, 2020

luckyvs1 commented Jul 12, 2020

pandas1.0.1 has trouble with certain column names #32409

pandas1.0.1 has trouble with certain column names #32409

Comments

jeremy-rutman commented Mar 3, 2020 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS (pandas 1.0.1 which hits error)

INSTALLED VERSIONS (pandas0.25.3 which doesnt hit error)

TomAugspurger commented Mar 3, 2020

jeremy-rutman commented Mar 3, 2020 • edited Loading

TomAugspurger commented Mar 11, 2020

jeremy-rutman commented Mar 11, 2020

simonjayhawkins commented Mar 31, 2020

simonjayhawkins commented Mar 31, 2020

simonjayhawkins commented Mar 31, 2020

mroeschke commented May 11, 2020

rivera-fernando commented May 13, 2020

rivera-fernando commented May 13, 2020

luckyvs1 commented Jul 12, 2020

jeremy-rutman commented Mar 3, 2020 •

edited

Loading

Output of `pd.show_versions()`

jeremy-rutman commented Mar 3, 2020 •

edited

Loading