Pandas Compatibility Notes#
Pandas Compatibility Note
DataFrame.transpose, DataFrame.T
Not supporting copy because default and only behavior is copy=True
Pandas Compatibility Note
DataFrame.all, Series.all
Parameters currently not supported are axis, bool_only, level.
Pandas Compatibility Note
DataFrame.all, Series.all
Parameters currently not supported are axis, bool_only, level.
Pandas Compatibility Note
DataFrame.any, Series.any
Parameters currently not supported are axis, bool_only, level.
Pandas Compatibility Note
DataFrame.any, Series.any
Parameters currently not supported are axis, bool_only, level.
Pandas Compatibility Note
DataFrame.count
Parameters currently not supported are axis and numeric_only.
Pandas Compatibility Note
DataFrame.diff
Diff currently only supports numeric dtype columns.
Pandas Compatibility Note
DataFrame.empty, Series.empty
If DataFrame/Series contains only null values, it is still not considered empty. See the example above.
Pandas Compatibility Note
DataFrame.eval
Additional kwargs are not supported.
Bitwise and logical operators are not dtype-dependent. Specifically, & must be used for bitwise operators on integers, not and, which is specifically for the logical and between booleans.
Only numerical types are currently supported.
Operators generally will not cast automatically. Users are responsible for casting columns to suitable types before evaluating a function.
Multiple assignments to the same name (i.e. a sequence of assignment statements where later statements are conditioned upon the output of earlier statements) is not supported.
Pandas Compatibility Note
DataFrame.from_arrow
Does not support automatically setting index column(s) similar to how
to_pandas
works for PyArrow Tables.
Pandas Compatibility Note
DataFrame.join
other must be a single DataFrame for now.
on is not supported yet due to lack of multi-index support.
Pandas Compatibility Note
DataFrame.kurtosis
Parameters currently not supported are level and numeric_only
Pandas Compatibility Note
DataFrame.kurtosis
Parameters currently not supported are level and numeric_only
Pandas Compatibility Note
DataFrame.max, Series.max
Parameters currently not supported are level, numeric_only.
Pandas Compatibility Note
DataFrame.median, Series.median
Parameters currently not supported are level and numeric_only.
Pandas Compatibility Note
DataFrame.median, Series.median
Parameters currently not supported are level and numeric_only.
Pandas Compatibility Note
DataFrame.merge
DataFrames merges in cuDF result in non-deterministic row ordering.
Pandas Compatibility Note
DataFrame.min, Series.min
Parameters currently not supported are level, numeric_only.
Pandas Compatibility Note
DataFrame.product, Series.product
Parameters currently not supported are level`, numeric_only.
Pandas Compatibility Note
DataFrame.product, Series.product
Parameters currently not supported are level`, numeric_only.
Pandas Compatibility Note
DataFrame.quantile
One notable difference from Pandas is when DataFrame is of non-numeric types and result is expected to be a Series in case of Pandas. cuDF will return a DataFrame as it doesn't support mixed types under Series.
Pandas Compatibility Note
DataFrame.query
One difference from pandas is that query
currently only
supports numeric, datetime, timedelta, or bool dtypes.
Pandas Compatibility Note
DataFrame.reindex
Note: One difference from Pandas is that NA
is used for rows
that do not match, rather than NaN
. One side effect of this is
that the column http_status
retains an integer dtype in cuDF
where it is cast to float in Pandas.
Pandas Compatibility Note
DataFrame.rename
Not Supporting: level
Rename will not overwrite column names. If a list with duplicates is passed, column names will be postfixed with a number.
Pandas Compatibility Note
DataFrame.replace, Series.replace
Parameters that are currently not supported are: limit, regex, method
Pandas Compatibility Note
DataFrame.resample, Series.resample
Note that the dtype of the index (or the 'on' column if using 'on=') in the result will be of a frequency closest to the resampled frequency. For example, if resampling from nanoseconds to milliseconds, the index will be of dtype 'datetime64[ms]'.
Pandas Compatibility Note
DataFrame.sample, Series.sample
When sampling from axis=0/'index'
, random_state
can be
either a numpy random state (numpy.random.RandomState
)
or a cupy random state (cupy.random.RandomState
). When a numpy
random state is used, the output is guaranteed to match the output
of the corresponding pandas method call, but generating the sample
maybe slow. If exact pandas equivalence is not required, using a
cupy random state will achieve better performance,
especially when sampling large number of
items. It's advised to use the matching ndarray type to
the random state for the weights array.
Pandas Compatibility Note
DataFrame.skew, Series.skew, Frame.skew
The axis parameter is not currently supported.
Pandas Compatibility Note
DataFrame.sort_index, Series.sort_index
Not supporting: kind, sort_remaining=False
Pandas Compatibility Note
DataFrame.sort_values, Series.sort_values
Support axis='index' only.
Not supporting: inplace, kind
Pandas Compatibility Note
DataFrame.std, Series.std
Parameters currently not supported are level and numeric_only
Pandas Compatibility Note
DataFrame.sum, Series.sum
Parameters currently not supported are level, numeric_only.
Pandas Compatibility Note
DataFrame.transpose, DataFrame.T
Not supporting copy because default and only behavior is copy=True
Pandas Compatibility Note
DataFrame.truncate, Series.truncate
The copy
parameter is only present for API compatibility, but
copy=False
is not supported. This method always generates a
copy.
Pandas Compatibility Note
DataFrame.var, Series.var
Parameters currently not supported are level and numeric_only
Pandas Compatibility Note
DataFrame.where, Series.where
Note that where
treats missing values as falsy,
in parallel with pandas treatment of nullable data:
>>> gsr = cudf.Series([1, 2, 3])
>>> gsr.where([True, False, cudf.NA])
0 1
1 <NA>
2 <NA>
dtype: int64
>>> gsr.where([True, False, False])
0 1
1 <NA>
2 <NA>
dtype: int64
Pandas Compatibility Note
DataFrame.all, Series.all
Parameters currently not supported are axis, bool_only, level.
Pandas Compatibility Note
DataFrame.all, Series.all
Parameters currently not supported are axis, bool_only, level.
Pandas Compatibility Note
DataFrame.any, Series.any
Parameters currently not supported are axis, bool_only, level.
Pandas Compatibility Note
DataFrame.any, Series.any
Parameters currently not supported are axis, bool_only, level.
Pandas Compatibility Note
DataFrame.max, Series.max
Parameters currently not supported are level, numeric_only.
Pandas Compatibility Note
DataFrame.min, Series.min
Parameters currently not supported are level, numeric_only.
Pandas Compatibility Note
DataFrame.where, Series.where
Note that where
treats missing values as falsy,
in parallel with pandas treatment of nullable data:
>>> gsr = cudf.Series([1, 2, 3])
>>> gsr.where([True, False, cudf.NA])
0 1
1 <NA>
2 <NA>
dtype: int64
>>> gsr.where([True, False, False])
0 1
1 <NA>
2 <NA>
dtype: int64
Pandas Compatibility Note
DataFrame.all, Series.all
Parameters currently not supported are axis, bool_only, level.
Pandas Compatibility Note
DataFrame.all, Series.all
Parameters currently not supported are axis, bool_only, level.
Pandas Compatibility Note
DataFrame.any, Series.any
Parameters currently not supported are axis, bool_only, level.
Pandas Compatibility Note
DataFrame.any, Series.any
Parameters currently not supported are axis, bool_only, level.
Pandas Compatibility Note
DataFrame.empty, Series.empty
If DataFrame/Series contains only null values, it is still not considered empty. See the example above.
Pandas Compatibility Note
DataFrame.kurtosis
Parameters currently not supported are level and numeric_only
Pandas Compatibility Note
DataFrame.kurtosis
Parameters currently not supported are level and numeric_only
Pandas Compatibility Note
Series.map
Please note map currently only supports fixed-width numeric type functions.
Pandas Compatibility Note
DataFrame.max, Series.max
Parameters currently not supported are level, numeric_only.
Pandas Compatibility Note
DataFrame.median, Series.median
Parameters currently not supported are level and numeric_only.
Pandas Compatibility Note
DataFrame.median, Series.median
Parameters currently not supported are level and numeric_only.
Pandas Compatibility Note
DataFrame.min, Series.min
Parameters currently not supported are level, numeric_only.
Pandas Compatibility Note
DataFrame.product, Series.product
Parameters currently not supported are level`, numeric_only.
Pandas Compatibility Note
DataFrame.product, Series.product
Parameters currently not supported are level`, numeric_only.
Pandas Compatibility Note
Series.reindex
Note: One difference from Pandas is that NA
is used for rows
that do not match, rather than NaN
. One side effect of this is
that the series retains an integer dtype in cuDF
where it is cast to float in Pandas.
Pandas Compatibility Note
Series.rename
Supports scalar values only for changing name attribute
The
inplace
andlevel
is not supported
Pandas Compatibility Note
DataFrame.replace, Series.replace
Parameters that are currently not supported are: limit, regex, method
Pandas Compatibility Note
DataFrame.resample, Series.resample
Note that the dtype of the index (or the 'on' column if using 'on=') in the result will be of a frequency closest to the resampled frequency. For example, if resampling from nanoseconds to milliseconds, the index will be of dtype 'datetime64[ms]'.
Pandas Compatibility Note
DataFrame.sample, Series.sample
When sampling from axis=0/'index'
, random_state
can be
either a numpy random state (numpy.random.RandomState
)
or a cupy random state (cupy.random.RandomState
). When a numpy
random state is used, the output is guaranteed to match the output
of the corresponding pandas method call, but generating the sample
maybe slow. If exact pandas equivalence is not required, using a
cupy random state will achieve better performance,
especially when sampling large number of
items. It's advised to use the matching ndarray type to
the random state for the weights array.
Pandas Compatibility Note
DataFrame.skew, Series.skew, Frame.skew
The axis parameter is not currently supported.
Pandas Compatibility Note
DataFrame.sort_index, Series.sort_index
Not supporting: kind, sort_remaining=False
Pandas Compatibility Note
Series.sort_values
Support axis='index' only.
The inplace and kind argument is currently unsupported
Pandas Compatibility Note
DataFrame.std, Series.std
Parameters currently not supported are level and numeric_only
Pandas Compatibility Note
DataFrame.sum, Series.sum
Parameters currently not supported are level, numeric_only.
Pandas Compatibility Note
DataFrame.truncate, Series.truncate
The copy
parameter is only present for API compatibility, but
copy=False
is not supported. This method always generates a
copy.
Pandas Compatibility Note
DataFrame.var, Series.var
Parameters currently not supported are level and numeric_only
Pandas Compatibility Note
DataFrame.where, Series.where
Note that where
treats missing values as falsy,
in parallel with pandas treatment of nullable data:
>>> gsr = cudf.Series([1, 2, 3])
>>> gsr.where([True, False, cudf.NA])
0 1
1 <NA>
2 <NA>
dtype: int64
>>> gsr.where([True, False, False])
0 1
1 <NA>
2 <NA>
dtype: int64
Pandas Compatibility Note
ListMethods.sort_values
The inplace
and kind
arguments are currently not supported.
Pandas Compatibility Note
StringMethods.contains
The parameters case and na are not yet supported and will raise a NotImplementedError if anything other than the default value is set. The flags parameter currently only supports re.DOTALL and re.MULTILINE.
Pandas Compatibility Note
StringMethods.count
flags parameter currently only supports re.DOTALL and re.MULTILINE.
Some characters need to be escaped when passing in pat. e.g.
'$'
has a special meaning in regex and must be escaped when finding this literal character.
Pandas Compatibility Note
StringMethods.endswith
na parameter is not yet supported, as cudf uses native strings instead of Python objects.
Pandas Compatibility Note
StringMethods.extract
The flags parameter currently only supports re.DOTALL and re.MULTILINE.
Pandas Compatibility Note
StringMethods.findall
The flags parameter currently only supports re.DOTALL and re.MULTILINE.
Pandas Compatibility Note
StringMethods.match
Parameters case and na are currently not supported. The flags parameter currently only supports re.DOTALL and re.MULTILINE.
Pandas Compatibility Note
StringMethods.partition
The parameter expand is not yet supported and will raise a NotImplementedError if anything other than the default value is set.
Pandas Compatibility Note
StringMethods.replace
The parameters case and flags are not yet supported and will raise a NotImplementedError if anything other than the default value is set.
Pandas Compatibility Note
GroupBy.apply
cuDF's groupby.apply
is limited compared to pandas.
In some situations, Pandas returns the grouped keys as part of
the index while cudf does not due to redundancy. For example:
>>> import pandas as pd
>>> df = pd.DataFrame({
... 'a': [1, 1, 2, 2],
... 'b': [1, 2, 1, 2],
... 'c': [1, 2, 3, 4],
... })
>>> gdf = cudf.from_pandas(df)
>>> df.groupby('a')[["b", "c"]].apply(lambda x: x.iloc[[0]])
b c
a
1 0 1 1
2 2 1 3
>>> gdf.groupby('a')[["b", "c"]].apply(lambda x: x.iloc[[0]])
b c
0 1 1
2 1 3
Pandas Compatibility Note
series.DatetimeProperties.strftime
The following date format identifiers are not yet
supported: %c
, %x
,``%X``
Pandas Compatibility Note
DataFrame.merge
DataFrames merges in cuDF result in non-deterministic row ordering.
Pandas Compatibility Note
cudf.to_numeric
An important difference from pandas is that this function does not
accept mixed numeric/non-numeric type sequences.
For example [1, 'a']
. A TypeError
will be raised when such
input is received, regardless of errors
parameter.