pandas arrays, scalars, and data types#
Objects#
For most data types, pandas uses NumPy arrays as the concrete
objects contained with a Index
, Series
, or
DataFrame
.
For some data types, pandas extends NumPy’s type system. String aliases for these types can be found at dtypes.
Kind of Data |
pandas Data Type |
Scalar |
Array |
---|---|---|---|
TZ-aware datetime |
|||
Timedeltas |
(none) |
||
Period (time spans) |
|||
Intervals |
|||
Nullable Integer |
|
(none) |
|
Nullable Float |
|
(none) |
|
Categorical |
(none) |
||
Sparse |
(none) |
||
Strings |
|||
Nullable Boolean |
|||
PyArrow |
Python Scalars or |
pandas and third-party libraries can extend NumPy’s type system (see Extension types).
The top-level array()
method can be used to create a new array, which may be
stored in a Series
, Index
, or as a column in a DataFrame
.
|
Create an array. |
PyArrow#
Warning
This feature is experimental, and the API can change in a future release without warning.
The arrays.ArrowExtensionArray
is backed by a pyarrow.ChunkedArray
with a
pyarrow.DataType
instead of a NumPy array and data type. The .dtype
of a arrays.ArrowExtensionArray
is an ArrowDtype
.
Pyarrow provides similar array and data type support as NumPy including first-class nullability support for all data types, immutability and more.
The table below shows the equivalent pyarrow-backed (pa
), pandas extension, and numpy (np
) types that are recognized by pandas.
Pyarrow-backed types below need to be passed into ArrowDtype
to be recognized by pandas e.g. pd.ArrowDtype(pa.bool_())
PyArrow type |
pandas extension type |
NumPy type |
---|---|---|
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
(none) |
(none) |
|
(none) |
(none) |
|
|
||
(none) |
(none) |
|
(none) |
(none) |
|
(none) |
|
|
(none) |
(none) |
|
|
||
(none) |
(none) |
|
(none) |
(none) |
|
(none) |
(none) |
|
(none) |
Note
Pyarrow-backed string support is provided by both pd.StringDtype("pyarrow")
and pd.ArrowDtype(pa.string())
.
pd.StringDtype("pyarrow")
is described below in the string section
and will be returned if the string alias "string[pyarrow]"
is specified. pd.ArrowDtype(pa.string())
generally has better interoperability with ArrowDtype
of different types.
While individual values in an arrays.ArrowExtensionArray
are stored as a PyArrow objects, scalars are returned
as Python scalars corresponding to the data type, e.g. a PyArrow int64 will be returned as Python int, or NA
for missing
values.
|
Pandas ExtensionArray backed by a PyArrow ChunkedArray. |
|
An ExtensionDtype for PyArrow data types. |
For more information, please see the PyArrow user guide
Datetimes#
NumPy cannot natively represent timezone-aware datetimes. pandas supports this
with the arrays.DatetimeArray
extension array, which can hold timezone-naive
or timezone-aware values.
Timestamp
, a subclass of datetime.datetime
, is pandas’
scalar type for timezone-naive or timezone-aware datetime data. NaT
is the missing value for datetime data.
|
Pandas replacement for python datetime.datetime object. |
Properties#
Return numpy datetime64 format in nanoseconds. |
|
Return day of the week. |
|
Return day of the week. |
|
Return the day of the year. |
|
Return the day of the year. |
|
Return the number of days in the month. |
|
Return the number of days in the month. |
|
Return True if year is a leap year. |
|
Check if the date is the last day of the month. |
|
Check if the date is the first day of the month. |
|
Check if date is last day of the quarter. |
|
Check if the date is the first day of the quarter. |
|
Return True if date is last day of the year. |
|
Return True if date is first day of the year. |
|
Return the quarter of the year. |
|
Alias for tzinfo. |
|
The abbreviation associated with self._creso. |
|
Return the week number of the year. |
|
Return the week number of the year. |
|
Methods#
|
Convert the underlying int64 representaton to the given unit. |
Convert timezone-aware Timestamp to another time zone. |
|
|
Return a new Timestamp ceiled to this resolution. |
|
Combine date, time into datetime with same date and time fields. |
Return ctime() style string. |
|
Return date object with same year, month and day. |
|
|
Return the day name of the Timestamp with specified locale. |
Return the daylight saving time (DST) adjustment. |
|
|
Return a new Timestamp floored to this resolution. |
|
Construct a timestamp from a a proleptic Gregorian ordinal. |
Transform timestamp[, tz] to tz's local time from POSIX timestamp. |
|
Return a named tuple containing ISO year, week number, and weekday. |
|
|
Return the time formatted according to ISO 8601. |
Return the day of the week represented by the date. |
|
|
Return the month name of the Timestamp with specified locale. |
Normalize Timestamp to midnight, preserving tz information. |
|
|
Return new Timestamp object representing current time local to tz. |
|
Implements datetime.replace, handles nanoseconds. |
|
Round the Timestamp to the specified resolution. |
|
Return a formatted string of the Timestamp. |
|
Function is not implemented. |
Return time object with same time but with tzinfo=None. |
|
Return POSIX timestamp as float. |
|
Return time tuple, compatible with time.localtime(). |
|
Return time object with same time and tzinfo. |
|
Return a numpy.datetime64 object with same precision. |
|
|
Convert the Timestamp to a NumPy datetime64. |
Convert TimeStamp to a Julian Date. |
|
|
Return an period of which this timestamp is an observation. |
|
Convert a Timestamp object to a native Python datetime object. |
|
Return the current time in the local timezone. |
Return proleptic Gregorian ordinal. |
|
Convert timezone-aware Timestamp to another time zone. |
|
|
Localize the Timestamp to a timezone. |
Return time zone name. |
|
Construct a timezone-aware UTC datetime from a POSIX timestamp. |
|
Return a new Timestamp representing UTC day and time. |
|
Return utc offset. |
|
Return UTC time tuple, compatible with time.localtime(). |
|
Return the day of the week represented by the date. |
A collection of timestamps may be stored in a arrays.DatetimeArray
.
For timezone-aware data, the .dtype
of a arrays.DatetimeArray
is a
DatetimeTZDtype
. For timezone-naive data, np.dtype("datetime64[ns]")
is used.
If the data are timezone-aware, then every value in the array must have the same timezone.
|
Pandas ExtensionArray for tz-naive or tz-aware datetime data. |
|
An ExtensionDtype for timezone-aware datetime data. |
Timedeltas#
NumPy can natively represent timedeltas. pandas provides Timedelta
for symmetry with Timestamp
. NaT
is the missing value for timedelta data.
|
Represents a duration, the difference between two dates or times. |
Properties#
Return a numpy timedelta64 array scalar view. |
|
Return a components namedtuple-like. |
|
Returns the days of the timedelta. |
|
Return the number of nanoseconds (n), where 0 <= n < 1 microsecond. |
|
Return the total hours, minutes, and seconds of the timedelta as seconds. |
|
|
Array view compatibility. |
Methods#
|
Convert the underlying int64 representation to the given unit. |
|
Return a new Timedelta ceiled to this resolution. |
|
Return a new Timedelta floored to this resolution. |
Format the Timedelta as ISO 8601 Duration. |
|
|
Round the Timedelta to the specified resolution. |
Convert a pandas Timedelta object into a python |
|
Return a numpy.timedelta64 object with 'ns' precision. |
|
|
Convert the Timedelta to a NumPy timedelta64. |
Total seconds in the duration. |
A collection of Timedelta
may be stored in a TimedeltaArray
.
|
Pandas ExtensionArray for timedelta data. |
Periods#
pandas represents spans of times as Period
objects.
Period#
|
Represents a period of time. |
Properties#
Get day of the month that a Period falls on. |
|
Day of the week the period lies in, with Monday=0 and Sunday=6. |
|
Day of the week the period lies in, with Monday=0 and Sunday=6. |
|
Return the day of the year. |
|
Return the day of the year. |
|
Get the total number of days in the month that this period falls on. |
|
Get the total number of days of the month that this period falls on. |
|
Get the Timestamp for the end of the period. |
|
Return a string representation of the frequency. |
|
Get the hour of the day component of the Period. |
|
Return True if the period's year is in a leap year. |
|
Get minute of the hour component of the Period. |
|
Return the month this Period falls on. |
|
Return the quarter this Period falls on. |
|
Fiscal year the Period lies in according to its starting-quarter. |
|
Get the second component of the Period. |
|
Get the Timestamp for the start of the period. |
|
Get the week of the year on the given Period. |
|
Day of the week the period lies in, with Monday=0 and Sunday=6. |
|
Get the week of the year on the given Period. |
|
Return the year this Period falls on. |
Methods#
|
Convert Period to desired frequency, at the start or end of the interval. |
|
Return the period of now's date. |
|
Returns a formatted string representation of the |
|
Return the Timestamp representation of the Period. |
A collection of Period
may be stored in a arrays.PeriodArray
.
Every period in a arrays.PeriodArray
must have the same freq
.
|
Pandas ExtensionArray for storing Period data. |
|
An ExtensionDtype for Period data. |
Intervals#
Arbitrary intervals can be represented as Interval
objects.
Immutable object implementing an Interval, a bounded slice-like interval. |
Properties#
String describing the inclusive side the intervals. |
|
Check if the interval is closed on the left side. |
|
Check if the interval is closed on the right side. |
|
Indicates if an interval is empty, meaning it contains no points. |
|
Left bound for the interval. |
|
Return the length of the Interval. |
|
Return the midpoint of the Interval. |
|
Check if the interval is open on the left side. |
|
Check if the interval is open on the right side. |
|
|
Check whether two Interval objects overlap. |
Right bound for the interval. |
A collection of intervals may be stored in an arrays.IntervalArray
.
|
Pandas array for interval data that are closed on the same side. |
|
An ExtensionDtype for Interval data. |
Nullable integer#
numpy.ndarray
cannot natively represent integer-data with missing values.
pandas provides this through arrays.IntegerArray
.
|
Array of integer (optional missing) values. |
An ExtensionDtype for int8 integer data. |
|
An ExtensionDtype for int16 integer data. |
|
An ExtensionDtype for int32 integer data. |
|
An ExtensionDtype for int64 integer data. |
|
An ExtensionDtype for uint8 integer data. |
|
An ExtensionDtype for uint16 integer data. |
|
An ExtensionDtype for uint32 integer data. |
|
An ExtensionDtype for uint64 integer data. |
Nullable float#
|
Array of floating (optional missing) values. |
An ExtensionDtype for float32 data. |
|
An ExtensionDtype for float64 data. |
Categoricals#
pandas defines a custom data type for representing data that can take only a
limited, fixed set of values. The dtype of a Categorical
can be described by
a CategoricalDtype
.
|
Type for categorical data with the categories and orderedness. |
An |
|
Whether the categories have an ordered relationship. |
Categorical data can be stored in a pandas.Categorical
|
Represent a categorical variable in classic R / S-plus fashion. |
The alternative Categorical.from_codes()
constructor can be used when you
have the categories and integer codes already:
|
Make a Categorical type from codes and categories or dtype. |
The dtype information is available on the Categorical
The |
|
The categories of this categorical. |
|
Whether the categories have an ordered relationship. |
|
The category codes of this categorical index. |
np.asarray(categorical)
works by implementing the array interface. Be aware, that this converts
the Categorical
back to a NumPy array, so categories and order information is not preserved!
|
The numpy array interface. |
A Categorical
can be stored in a Series
or DataFrame
.
To create a Series of dtype category
, use cat = s.astype(dtype)
or
Series(..., dtype=dtype)
where dtype
is either
the string
'category'
an instance of
CategoricalDtype
.
If the Series
is of dtype CategoricalDtype
, Series.cat
can be used to change the categorical
data. See Categorical accessor for more.
Sparse#
Data where a single value is repeated many times (e.g. 0
or NaN
) may
be stored efficiently as a arrays.SparseArray
.
|
An ExtensionArray for storing sparse data. |
|
Dtype for data stored in |
The Series.sparse
accessor may be used to access sparse-specific attributes
and methods if the Series
contains sparse values. See
Sparse accessor and the user guide for more.
Strings#
When working with text data, where each valid element is a string or missing,
we recommend using StringDtype
(with the alias "string"
).
|
Extension array for string data. |
|
Extension array for string data in a |
|
Extension dtype for string data. |
The Series.str
accessor is available for Series
backed by a arrays.StringArray
.
See String handling for more.
Nullable Boolean#
The boolean dtype (with the alias "boolean"
) provides support for storing
boolean data (True
, False
) with missing values, which is not possible
with a bool numpy.ndarray
.
|
Array of boolean (True/False) data with missing values. |
Extension dtype for boolean data. |
Utilities#
Constructors#
|
Combine list-like of Categorical-like, unioning categories. |
|
Return a string label of the type of a scalar or list-like of values. |
|
Convert input into a pandas only dtype object or a numpy dtype object. |
Data type introspection#
|
Check whether the provided array or dtype is of a real number dtype. |
|
Check whether the provided array or dtype is of a boolean dtype. |
|
(DEPRECATED) Check whether an array-like or dtype is of the Categorical dtype. |
|
Check whether the provided array or dtype is of a complex dtype. |
|
Check whether the provided array or dtype is of the datetime64 dtype. |
|
Check whether an array-like or dtype is of the datetime64 dtype. |
|
Check whether the provided array or dtype is of the datetime64[ns] dtype. |
|
(DEPRECATED) Check whether an array-like or dtype is of a DatetimeTZDtype dtype. |
|
Check if an object is a pandas extension array type. |
|
Check whether the provided array or dtype is of a float dtype. |
|
(DEPRECATED) Check whether the provided array or dtype is of the int64 dtype. |
|
Check whether the provided array or dtype is of an integer dtype. |
|
(DEPRECATED) Check whether an array-like or dtype is of the Interval dtype. |
|
Check whether the provided array or dtype is of a numeric dtype. |
|
Check whether an array-like or dtype is of the object dtype. |
|
(DEPRECATED) Check whether an array-like or dtype is of the Period dtype. |
|
Check whether the provided array or dtype is of a signed integer dtype. |
|
Check whether the provided array or dtype is of the string dtype. |
|
Check whether an array-like or dtype is of the timedelta64 dtype. |
|
Check whether the provided array or dtype is of the timedelta64[ns] dtype. |
|
Check whether the provided array or dtype is of an unsigned integer dtype. |
|
(DEPRECATED) Check whether an array-like is a 1-D pandas sparse array. |
Iterable introspection#
Check if the object is dict-like. |
|
Check if the object is a file-like object. |
|
|
Check if the object is list-like. |
Check if the object is a named tuple. |
|
Check if the object is an iterator. |
Scalar introspection#
|
Return True if given object is boolean. |
|
Return True if given object is complex. |
|
Return True if given object is float. |
Return True if hash(obj) will succeed, False otherwise. |
|
|
Return True if given object is integer. |
|
Check if the object is a number. |
|
Check if the object is a regex pattern instance. |
Check if the object can be compiled into a regex pattern instance. |
|
|
Return True if given object is scalar. |