Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG #34621 added nanosecond support to class Period #34720

Closed
wants to merge 7 commits into from

Conversation

OlivierLuG
Copy link
Contributor

@OlivierLuG OlivierLuG commented Jun 11, 2020

Closes #34621
Closes #17053

  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

The class Period do not support nanosecond. I've made a quick and dirty code to support nanoseconds. I have struggled to find an alternative to dateutil.parser, but I didn't found an alternative.

The PR may be a performance issue as I've add a Timestamp constructor to the dateutil.parser. There is for sure a better solution. Please let me know !

@mroeschke
Copy link
Member

mroeschke commented Jun 11, 2020

See #34621 (comment). I don't think this is a bug.

Edit: Misunderstood, confirmed a bug

@pep8speaks
Copy link

pep8speaks commented Jun 12, 2020

Hello @OlivierLuG! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-06-16 18:43:06 UTC

nanosecond = ts.nanosecond
if nanosecond != 0:
reso = 'nanosecond'
except (ValueError, OutOfBoundsDatetime):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOBDatetime is a subclass of ValueError so this is uneeded

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've first try to only use OutOfBoundsDatetime but the test code fails on many VM. Here is an example: https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=37155&view=logs&j=a3a13ea8-7cf0-5bdb-71bb-6ac8830ae35c&t=add65f64-6c25-5783-8fd6-d9aa1b63d9d4&l=471

pandas/_libs/tslibs/period.pyx:2410: in pandas._libs.tslibs.period.Period.__new__
    ts = Timestamp(value, freq=freq)
pandas/_libs/tslibs/timestamps.pyx:848: in pandas._libs.tslibs.timestamps.Timestamp.__new__
    ts = convert_to_tsobject(ts_input, tz, unit, 0, 0, nanosecond or 0)
pandas/_libs/tslibs/conversion.pyx:333: in pandas._libs.tslibs.conversion.convert_to_tsobject
    return convert_str_to_tsobject(ts, tz, unit, dayfirst, yearfirst)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   raise ValueError("could not convert string to Timestamp")
E   ValueError: could not convert string to Timestamp

pandas/_libs/tslibs/conversion.pyx:581: ValueError

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OlivierLuG I think Jeff meant that it should be enough to only use ValueError, as that would already cover the OutOfBoundsDatetime case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I got it. I will try a new PR with only ValueError

pandas/_libs/tslibs/period.pyx Show resolved Hide resolved
@@ -2404,6 2406,13 @@ class Period(_Period):
value = str(value)
value = value.upper()
dt, reso = parse_time_string(value, freq)
try:
ts = Timestamp(value, freq=freq)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't pass freq

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new PR is ready (waiting for the question about else)

@jreback jreback added the Period Period data type label Jun 14, 2020
@OlivierLuG
Copy link
Contributor Author

Closes #34621
Closes #17053

@OlivierLuG OlivierLuG requested a review from jreback June 29, 2020 19:42
@@ -484,6 484,22 @@ def test_period_cons_combined(self):
with pytest.raises(ValueError, match=msg):
Period("2011-01", freq="1D1W")

@pytest.mark.parametrize("day_", ["1970/01/01 ", "2020-12-31 ", "1981/09/13 "])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think the trailing underscores are necessary for these names

@@ -2404,6 2405,14 @@ class Period(_Period):
value = str(value)
value = value.upper()
dt, reso = parse_time_string(value, freq)
try:
ts = Timestamp(value)
except ValueError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we end up parsing this twice which is not good, yeah i think this needs to be integrated a bit more to parse_time_string.

i guess the currently soln would be ok in the interim, but would need to run asv's on all periods to see what kind of perf hit here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far I read, nanosecond was not accepted into dateutil module. This interim solution could stand for a while.
I stay tuned...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far I read, nanosecond was not accepted into dateutil module. This interim solution could stand for a while.
I stay tuned...

we don't wait for dateutil in this, as it will not likey every support nanosecond because the standard library does not

Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OlivierLuG is this still active? If yes can you merge master & resolve conflicts and address comments

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you run the period asv's to see if any perf issues

@@ -2404,6 2405,14 @@ class Period(_Period):
value = str(value)
value = value.upper()
dt, reso = parse_time_string(value, freq)
try:
ts = Timestamp(value)
except ValueError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far I read, nanosecond was not accepted into dateutil module. This interim solution could stand for a while.
I stay tuned...

we don't wait for dateutil in this, as it will not likey every support nanosecond because the standard library does not

@jreback
Copy link
Contributor

jreback commented Nov 26, 2020

this looked ok, pls merge master and address comments, ping on green

@arw2019
Copy link
Member

arw2019 commented Nov 30, 2020

Closing in favor of #38175

@arw2019 arw2019 closed this Nov 30, 2020
@OlivierLuG OlivierLuG deleted the BUG_#34621 branch January 2, 2021 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Period constructor does not take into account nanonseconds BUG: Period with nanoseconds (?)
7 participants