cudf.get_dummies#
- cudf.get_dummies(df, prefix=None, prefix_sep='_', dummy_na=False, columns=None, cats=None, sparse=False, drop_first=False, dtype='uint8')#
Returns a dataframe whose columns are the one hot encodings of all columns in df
- Parameters
- dfarray-like, Series, or DataFrame
Data of which to get dummy indicators.
- prefixstr, dict, or sequence, optional
Prefix to append. Either a str (to apply a constant prefix), dict mapping column names to prefixes, or sequence of prefixes to apply with the same length as the number of columns. If not supplied, defaults to the empty string
- prefix_sepstr, dict, or sequence, optional, default ‘_’
Separator to use when appending prefixes
- dummy_naboolean, optional
Add a column to indicate Nones, if False Nones are ignored.
- catsdict, optional
Dictionary mapping column names to sequences of values representing that column’s category. If not supplied, it is computed as the unique values of the column.
- sparseboolean, optional
Right now this is NON-FUNCTIONAL argument in rapids.
- drop_firstboolean, optional
Right now this is NON-FUNCTIONAL argument in rapids.
- columnssequence of str, optional
Names of columns to encode. If not provided, will attempt to encode all columns. Note this is different from pandas default behavior, which encodes all columns with dtype object or categorical
- dtypestr, optional
Output dtype, default ‘uint8’
Examples
>>> import cudf >>> df = cudf.DataFrame({"a": ["value1", "value2", None], "b": [0, 0, 0]}) >>> cudf.get_dummies(df) b a_value1 a_value2 0 0 1 0 1 0 0 1 2 0 0 0
>>> cudf.get_dummies(df, dummy_na=True) b a_None a_value1 a_value2 0 0 0 1 0 1 0 0 0 1 2 0 1 0 0
>>> import numpy as np >>> df = cudf.DataFrame({"a":cudf.Series([1, 2, np.nan, None], ... nan_as_null=False)}) >>> df a 0 1.0 1 2.0 2 NaN 3 <NA>
>>> cudf.get_dummies(df, dummy_na=True, columns=["a"]) a_1.0 a_2.0 a_nan a_null 0 1 0 0 0 1 0 1 0 0 2 0 0 1 0 3 0 0 0 1
>>> series = cudf.Series([1, 2, None, 2, 4]) >>> series 0 1 1 2 2 <NA> 3 2 4 4 dtype: int64 >>> cudf.get_dummies(series, dummy_na=True) null 1 2 4 0 0 1 0 0 1 0 0 1 0 2 1 0 0 0 3 0 0 1 0 4 0 0 0 1