-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Regression - AmbiguousTimeError creating DataFrame #42505
Comments
@mzeitlin11 - I think standard practice is to cc the author of a PR where a regression has (or may have) occurred. |
Brainstorm: Is it ever good practice to support a conversion from an offset to a named timezone? If we already "know" that 2019-11-03 01:00:00-0700 corresponds to the Los Angeles timezone, we can of course say that this is the first 1AM of the evening and it's therefore not an ambiguous time, but can that be generalized? If not it strikes me that the AmbiguousTimeError exception should simply occur on the initial tz_convert, and that it should be up to the user to strip the offset data from the datetime string and force the conversion a different way. Getting timezones right is often a pain as an end user but ultimately I think it's better to not shoulder ambiguities that the end user should resolve. |
based on a quick look:
|
@jmcomie - I don't think that analysis is correct. Once the timezone or offset is attached, you know the exact instant (in seconds past 1970 GMT), so attaching a new timezone is a simple matter of reinterpreting the clock/calendar time on the wall. Additional fact: in my domain, the data |
Touché. You're right -- both time representations are precise and the conversion should be valid. |
changing milestone to 1.3.5 |
@simonjayhawkins the DST transition is coming this weekend and this issue does not seem to be fixed. Is there a workaround, or a chance of fixing it yet this week? |
Additional test case - under
but this throws the above error:
(where |
Does the workaround #42505 (comment) not work for you? There is essentially no chance of a release with a fix for this in the next week, no. The next planned release is end of November I think. A PR addressing this would be welcome. |
@jbrockmendel - no, that workaround doesn't apply in all cases. For example, the second case I showed above would have to use Am I correct that the proposed workaround is to simply never use a scalar datetime when creating a In my code, I'm handed a |
I not sure working around this issue is necessarily the best way forward. The release note from the commit that caused the change in behavior is...
This is in no way related to the change in behavior reported in this issue. PR restoring the 1.2.5 behavior is welcome. AFAICT we have the exception raised from on master and 1.2.5 we have
Now this behavior of raising when the dtype is specified and is the same dtype as the inferred dtype when the dtype is not specified does indeed seem strange and maybe it is this that needs to be changed to fix the regression even though this is not changed from 1.2.5. So what has changed? There we two changes in #39767, The change is that
on 1.2.5
on master
Now the docstring for Suffice to say, using the int val in
|
@jbrockmendel any thoughts on the best way to fix? Maybe three options
I'm not sure about option 1, but I assume that the new behaviour is correct? (Needs a better docstring) I'm not keen on option 2 for a backport PR and late in the 1.3.x series even though it's probably the correct long term solution. That leaves option 3. Not normally keen on special casing but probably a trivial and easy fix and safe and suitable for a backport. |
Best guess remains sequence_to_dt64ns (#42505 (comment)). i'll try to confirm this guess today. |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Pandas 1.25:
Pandas 1.3:
Problem description
The
dt
object already has a timezone attribute, so it shouldn't be converting/inferring at all.Expected Output
1.25 and 1.3.0 should have the same output.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : f00ed8f
python : 3.7.3.final.0
python-bits : 64
OS : Darwin
OS-release : 20.5.0
Version : Darwin Kernel Version 20.5.0: Sat May 8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.0
numpy : 1.21.0
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.2
setuptools : 57.0.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.9.1 (dt dec pq3 ext lo64)
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.2.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.7.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: