cudf.DataFrame.apply_chunks#
- DataFrame.apply_chunks(func, incols, outcols, kwargs=None, pessimistic_nulls=True, chunks=None, blkct=None, tpb=None)#
Transform user-specified chunks using the user-provided function.
- Parameters
- dfDataFrame
The source dataframe.
- funcfunction
The transformation function that will be executed on the CUDA GPU.
- incols: list or dict
A list of names of input columns that match the function arguments. Or, a dictionary mapping input column names to their corresponding function arguments such as {‘col1’: ‘arg1’}.
- outcols: dict
A dictionary of output column names and their dtype.
- kwargs: dict
name-value of extra arguments. These values are passed directly into the function.
- pessimistic_nullsbool
Whether or not apply_rows output should be null when any corresponding input is null. If False, all outputs will be non-null, but will be the result of applying func against the underlying column data, which may be garbage.
- chunksint or Series-like
If it is an
int
, it is the chunksize. If it is an array, it contains integer offset for the start of each chunk. The span of a chunk for chunk i-th isdata[chunks[i] : chunks[i + 1]]
for anyi + 1 < chunks.size
; or,data[chunks[i]:]
for thei == len(chunks) - 1
.- tpbint; optional
The threads-per-block for the underlying kernel. If not specified (Default), uses Numba
.forall(...)
built-in to query the CUDA Driver API to determine optimal kernel launch configuration. Specify 1 to emulate serial execution for each chunk. It is a good starting point but inefficient. Its maximum possible value is limited by the available CUDA GPU resources.- blkctint; optional
The number of blocks for the underlying kernel. If not specified (Default) and
tpb
is not specified (Default), uses Numba.forall(...)
built-in to query the CUDA Driver API to determine optimal kernel launch configuration. If not specified (Default) andtpb
is specified, useschunks
as the number of blocks.
See also
Examples
For
tpb > 1
,func
is executed bytpb
number of threads concurrently. To access the thread id and count, usenumba.cuda.threadIdx.x
andnumba.cuda.blockDim.x
, respectively (See numba CUDA kernel documentation).In the example below, the kernel is invoked concurrently on each specified chunk. The kernel computes the corresponding output for the chunk.
By looping over the range
range(cuda.threadIdx.x, in1.size, cuda.blockDim.x)
, the kernel function can be used with any tpb in an efficient manner.>>> from numba import cuda >>> @cuda.jit ... def kernel(in1, in2, in3, out1): ... for i in range(cuda.threadIdx.x, in1.size, cuda.blockDim.x): ... x = in1[i] ... y = in2[i] ... z = in3[i] ... out1[i] = x * y + z