What’s new in 2.2.1 (February 22, 2024)#

These are the changes in pandas 2.2.1. See Release notes for a full changelog including other versions of pandas.

Enhancements#

Added pyarrow pip extra so users can install pandas and pyarrow with pip with pip install pandas[pyarrow] (GH 54466)

Fixed regressions#

Fixed memory leak in read_csv() (GH 57039)
Fixed performance regression in Series.combine_first() (GH 55845)
Fixed regression causing overflow for near-minimum timestamps (GH 57150)
Fixed regression in concat() changing long-standing behavior that always sorted the non-concatenation axis when the axis was a DatetimeIndex (GH 57006)
Fixed regression in merge_ordered() raising TypeError for fill_method="ffill" and how="left" (GH 57010)
Fixed regression in pandas.testing.assert_series_equal() defaulting to check_exact=True when checking the Index (GH 57067)
Fixed regression in read_json() where an Index would be returned instead of a RangeIndex (GH 57429)
Fixed regression in wide_to_long() raising an AttributeError for string columns (GH 57066)
Fixed regression in DataFrameGroupBy.idxmin(), DataFrameGroupBy.idxmax(), SeriesGroupBy.idxmin(), SeriesGroupBy.idxmax() ignoring the skipna argument (GH 57040)
Fixed regression in DataFrameGroupBy.idxmin(), DataFrameGroupBy.idxmax(), SeriesGroupBy.idxmin(), SeriesGroupBy.idxmax() where values containing the minimum or maximum value for the dtype could produce incorrect results (GH 57040)
Fixed regression in CategoricalIndex.difference() raising KeyError when other contains null values other than NaN (GH 57318)
Fixed regression in DataFrame.groupby() raising ValueError when grouping by a Series in some cases (GH 57276)
Fixed regression in DataFrame.loc() raising IndexError for non-unique, masked dtype indexes where result has more than 10,000 rows (GH 57027)
Fixed regression in DataFrame.loc() which was unnecessarily throwing “incompatible dtype warning” when expanding with partial row indexer and multiple columns (see PDEP6) (GH 56503)
Fixed regression in DataFrame.map() with na_action="ignore" not being respected for NumPy nullable and ArrowDtypes (GH 57316)
Fixed regression in DataFrame.merge() raising ValueError for certain types of 3rd-party extension arrays (GH 57316)
Fixed regression in DataFrame.query() with all NaT column with object dtype (GH 57068)
Fixed regression in DataFrame.shift() raising AssertionError for axis=1 and empty DataFrame (GH 57301)
Fixed regression in DataFrame.sort_index() not producing a stable sort for a index with duplicates (GH 57151)
Fixed regression in DataFrame.to_dict() with orient='list' and datetime or timedelta types returning integers (GH 54824)
Fixed regression in DataFrame.to_json() converting nullable integers to floats (GH 57224)
Fixed regression in DataFrame.to_sql() when method="multi" is passed and the dialect type is not Oracle (GH 57310)
Fixed regression in DataFrame.transpose() with nullable extension dtypes not having F-contiguous data potentially causing exceptions when used (GH 57315)
Fixed regression in DataFrame.update() emitting incorrect warnings about downcasting (GH 57124)
Fixed regression in DataFrameGroupBy.idxmin(), DataFrameGroupBy.idxmax(), SeriesGroupBy.idxmin(), SeriesGroupBy.idxmax() ignoring the skipna argument (GH 57040)
Fixed regression in DataFrameGroupBy.idxmin(), DataFrameGroupBy.idxmax(), SeriesGroupBy.idxmin(), SeriesGroupBy.idxmax() where values containing the minimum or maximum value for the dtype could produce incorrect results (GH 57040)
Fixed regression in ExtensionArray.to_numpy() raising for non-numeric masked dtypes (GH 56991)
Fixed regression in Index.join() raising TypeError when joining an empty index to a non-empty index containing mixed dtype values (GH 57048)
Fixed regression in Series.astype() introducing decimals when converting from integer with missing values to string dtype (GH 57418)
Fixed regression in Series.pct_change() raising a ValueError for an empty Series (GH 57056)
Fixed regression in Series.to_numpy() when dtype is given as float and the data contains NaNs (GH 57121)
Fixed regression in addition or subtraction of DateOffset objects with millisecond components to datetime64 Index, Series, or DataFrame (GH 57529)

Bug fixes#

Fixed bug in pandas.api.interchange.from_dataframe() which was raising for Nullable integers (GH 55069)
Fixed bug in pandas.api.interchange.from_dataframe() which was raising for empty inputs (GH 56700)
Fixed bug in pandas.api.interchange.from_dataframe() which wasn’t converting columns names to strings (GH 55069)
Fixed bug in DataFrame.__getitem__() for empty DataFrame with Copy-on-Write enabled (GH 57130)
Fixed bug in PeriodIndex.asfreq() which was silently converting frequencies which are not supported as period frequencies instead of raising an error (GH 56945)

Other#

Note

The DeprecationWarning that was raised when pandas was imported without PyArrow being installed has been removed. This decision was made because the warning was too noisy for too many users and a lot of feedback was collected about the decision to make PyArrow a required dependency. Pandas is currently considering the decision whether or not PyArrow should be added as a hard dependency in 3.0. Interested users can follow the discussion here.

Added the argument skipna to DataFrameGroupBy.first(), DataFrameGroupBy.last(), SeriesGroupBy.first(), and SeriesGroupBy.last(); achieving skipna=False used to be available via DataFrameGroupBy.nth(), but the behavior was changed in pandas 2.0.0 (GH 57019)
Added the argument skipna to Resampler.first(), Resampler.last() (GH 57019)

Contributors#

A total of 14 people contributed patches to this release. People with a “ ” by their names contributed a patch for the first time.

Albert Villanova del Moral
Luke Manley
Lumberbot (aka Jack)
Marco Edward Gorelli
Matthew Roeschke
Natalia Mokeeva
Pandas Development Team
Patrick Hoefler
Richard Shadrach
Robert Schmidtke
Samuel Chai
Thomas Li
William Ayd
dependabot[bot]