cudf.get_dummies#

cudf.get_dummies(df, prefix=None, prefix_sep='_', dummy_na=False, columns=None, cats=None, sparse=False, drop_first=False, dtype='uint8')#

Returns a dataframe whose columns are the one hot encodings of all columns in df

Parameters
dfarray-like, Series, or DataFrame

Data of which to get dummy indicators.

prefixstr, dict, or sequence, optional

Prefix to append. Either a str (to apply a constant prefix), dict mapping column names to prefixes, or sequence of prefixes to apply with the same length as the number of columns. If not supplied, defaults to the empty string

prefix_sepstr, dict, or sequence, optional, default ‘_’

Separator to use when appending prefixes

dummy_naboolean, optional

Add a column to indicate Nones, if False Nones are ignored.

catsdict, optional

Dictionary mapping column names to sequences of values representing that column’s category. If not supplied, it is computed as the unique values of the column.

sparseboolean, optional

Right now this is NON-FUNCTIONAL argument in rapids.

drop_firstboolean, optional

Right now this is NON-FUNCTIONAL argument in rapids.

columnssequence of str, optional

Names of columns to encode. If not provided, will attempt to encode all columns. Note this is different from pandas default behavior, which encodes all columns with dtype object or categorical

dtypestr, optional

Output dtype, default ‘uint8’

Examples

>>> import cudf
>>> df = cudf.DataFrame({"a": ["value1", "value2", None], "b": [0, 0, 0]})
>>> cudf.get_dummies(df)
   b  a_value1  a_value2
0  0         1         0
1  0         0         1
2  0         0         0
>>> cudf.get_dummies(df, dummy_na=True)
   b  a_None  a_value1  a_value2
0  0       0         1         0
1  0       0         0         1
2  0       1         0         0
>>> import numpy as np
>>> df = cudf.DataFrame({"a":cudf.Series([1, 2, np.nan, None],
...                     nan_as_null=False)})
>>> df
      a
0   1.0
1   2.0
2   NaN
3  <NA>
>>> cudf.get_dummies(df, dummy_na=True, columns=["a"])
   a_1.0  a_2.0  a_nan  a_null
0      1      0      0       0
1      0      1      0       0
2      0      0      1       0
3      0      0      0       1
>>> series = cudf.Series([1, 2, None, 2, 4])
>>> series
0       1
1       2
2    <NA>
3       2
4       4
dtype: int64
>>> cudf.get_dummies(series, dummy_na=True)
   null  1  2  4
0     0  1  0  0
1     0  0  1  0
2     1  0  0  0
3     0  0  1  0
4     0  0  0  1