Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

series.str.cat(series.str) is concatenating only the largest string #28277

Closed
kayush2O6 opened this issue Sep 4, 2019 · 8 comments · Fixed by #47755
Closed

series.str.cat(series.str) is concatenating only the largest string #28277

kayush2O6 opened this issue Sep 4, 2019 · 8 comments · Fixed by #47755
Labels
Bug Error Reporting Incorrect or improved errors from pandas Strings String extension data type and string data

Comments

@kayush2O6
Copy link

kayush2O6 commented Sep 4, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd

arr = ["AbC", "de", "FGHI", "j", "kLLLm"]

ps = pd.Series(arr)
expect = ps.str.cat(others=ps.str)
print(expect)
Out[16]:
0           NaN
1           NaN
2           NaN
3           NaN
4    kLLLmkLLLm
dtype: object

Problem description

series.str.cat(series.str) is concatenating only the largest string in the series but it should concatenate all the strings element wise.

Expected Output

Out[18]:
0        AbCAbC
1          dede
2      FGHIFGHI
3            jj
4    kLLLmkLLLm
dtype: object

Output of pd.show_versions()

In [2]: pandas.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-862.14.4.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 5.1.2
pip: 19.2.3
setuptools: 41.2.0
Cython: 0.29.13
numpy: 1.17.1
scipy: None
pyarrow: 0.14.1
xarray: None
IPython: 7.8.0
sphinx: 2.2.0
patsy: None
dateutil: 2.8.0
pytz: 2019.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: 0.3.4
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@TomAugspurger
Copy link
Contributor

I'm not sure what's going on, but that's not my expected output. series.str is an accessor, not an array-like of strings. I would expect an exception to be raised here.

@boomsquared
Copy link

I'm very new to python and there might be some misunderstanding on my part.

I believed the problem happens when passing StringMethods into is_list_like results in true (code).

This happens because StringMethods implements the iterator protocol (StringMethod's iterator, is_list_like implementation)

@WillAyd
Copy link
Member

WillAyd commented Sep 5, 2019

For the OP you probably just want expect = ps.str.cat(others=ps). @boomsquared if you have an idea of why this doesn't raise and would like to submit a PR would certainly welcome one

@WillAyd WillAyd added Bug Strings String extension data type and string data labels Sep 5, 2019
@WillAyd WillAyd added this to the Contributions Welcome milestone Sep 5, 2019
@WillAyd WillAyd added the Error Reporting Incorrect or improved errors from pandas label Sep 5, 2019
@boomsquared
Copy link

@WillAyd Thanks! Will give it a go

@SaturnFromTitan
Copy link
Contributor

take

@SaturnFromTitan
Copy link
Contributor

Like suggested in the PR discussion, I will implement a DeprecationWarning for Series.str.__iter__

@SaturnFromTitan
Copy link
Contributor

SaturnFromTitan commented Dec 8, 2019

Fyi: A FutureWarning was added and Series.str.__iter__ will be removed in a few versions. Until then, the issue remains open though, so we don't close it.

@jreback
Copy link
Contributor

jreback commented Dec 8, 2019

@SaturnFromTitan this can still be addressed independently right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas Strings String extension data type and string data
Projects
None yet
6 participants