Pandas Compatibility Notes#
Pandas Compatibility Note
DataFrame.quantile
One notable difference from Pandas is when DataFrame is of non-numeric types and result is expected to be a Series in case of Pandas. cuDF will return a DataFrame as it doesn't support mixed types under Series.
Pandas Compatibility Note
DataFrame.reindex
Note: One difference from Pandas is that NA
is used for rows
that do not match, rather than NaN
. One side effect of this is
that the column http_status
retains an integer dtype in cuDF
where it is cast to float in Pandas.
Pandas Compatibility Note
DataFrame.truncate, Series.truncate
The copy
parameter is only present for API compatibility, but
copy=False
is not supported. This method always generates a
copy.
Pandas Compatibility Note
Note that where
treats missing values as falsy,
in parallel with pandas treatment of nullable data:
>>> gsr = cudf.Series([1, 2, 3])
>>> gsr.where([True, False, cudf.NA])
0 1
1 <NA>
2 <NA>
dtype: int64
>>> gsr.where([True, False, False])
0 1
1 <NA>
2 <NA>
dtype: int64
Pandas Compatibility Note
MultiIndex.get_loc
The return types of this function may deviates from the method provided by Pandas. If the index is neither lexicographically sorted nor unique, a best effort attempt is made to coerce the found indices into a slice. For example:
>>> import pandas as pd
>>> import cudf
>>> x = pd.MultiIndex.from_tuples([
... (2, 1, 1), (1, 2, 3), (1, 2, 1),
... (1, 1, 1), (1, 1, 1), (2, 2, 1),
... ])
>>> x.get_loc(1)
array([False, True, True, True, True, False])
>>> cudf.from_pandas(x).get_loc(1)
slice(1, 5, 1)
Pandas Compatibility Note
Series.reindex
Note: One difference from Pandas is that NA
is used for rows
that do not match, rather than NaN
. One side effect of this is
that the series retains an integer dtype in cuDF
where it is cast to float in Pandas.
Pandas Compatibility Note
DataFrame.truncate, Series.truncate
The copy
parameter is only present for API compatibility, but
copy=False
is not supported. This method always generates a
copy.
Pandas Compatibility Note
Note that where
treats missing values as falsy,
in parallel with pandas treatment of nullable data:
>>> gsr = cudf.Series([1, 2, 3])
>>> gsr.where([True, False, cudf.NA])
0 1
1 <NA>
2 <NA>
dtype: int64
>>> gsr.where([True, False, False])
0 1
1 <NA>
2 <NA>
dtype: int64
Pandas Compatibility Note
groupby.fillna
This function may return result in different format to the method Pandas supports. For example:
>>> df = pd.DataFrame({'k': [1, 1, 2], 'v': [2, None, 4]})
>>> gdf = cudf.from_pandas(df)
>>> df.groupby('k').fillna({'v': 4}) # pandas
v
k
1 0 2.0
1 4.0
2 2 4.0
>>> gdf.groupby('k').fillna({'v': 4}) # cudf
v
0 2.0
1 4.0
2 4.0
Pandas Compatibility Note
groupby.apply
cuDF's groupby.apply
is limited compared to pandas.
In some situations, Pandas returns the grouped keys as part of
the index while cudf does not due to redundancy. For example:
>>> df = pd.DataFrame({
... 'a': [1, 1, 2, 2],
... 'b': [1, 2, 1, 2],
... 'c': [1, 2, 3, 4],
... })
>>> gdf = cudf.from_pandas(df)
>>> df.groupby('a').apply(lambda x: x.iloc[[0]])
a b c
a
1 0 1 1 1
2 2 2 1 3
>>> gdf.groupby('a').apply(lambda x: x.iloc[[0]])
a b c
0 1 1 1
2 2 1 3