libcudf
23.12.00
|
Modules | |
Concatenating | |
Gathering | |
Scattering | |
Slicing | |
Splitting | |
Shifting | |
Files | |
file | contiguous_split.hpp |
Table APIs for contiguous_split, pack, unpack, and metadata. | |
file | copying.hpp |
Column APIs for gather, scatter, split, slice, etc. | |
Classes | |
struct | cudf::packed_columns |
Column data in a serialized format. More... | |
struct | cudf::packed_table |
The result(s) of a cudf::contiguous_split. More... | |
class | cudf::chunked_pack |
Perform a chunked "pack" operation of the input table_view using a user provided buffer of size user_buffer_size . More... | |
Enumerations | |
enum class | cudf::out_of_bounds_policy : bool { cudf::NULLIFY , cudf::DONT_CHECK } |
Policy to account for possible out-of-bounds indices. More... | |
enum class | cudf::mask_allocation_policy { cudf::NEVER , cudf::RETAIN , cudf::ALWAYS } |
Indicates when to allocate a mask, based on an existing mask. More... | |
enum class | cudf::sample_with_replacement : bool { cudf::FALSE , cudf::TRUE } |
Indicates whether a row can be sampled more than once. More... | |
Functions | |
packed_columns | cudf::pack (cudf::table_view const &input, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Deep-copy a table_view into a serialized contiguous memory format. More... | |
std::vector< uint8_t > | cudf::pack_metadata (table_view const &table, uint8_t const *contiguous_buffer, size_t buffer_size) |
Produce the metadata used for packing a table stored in a contiguous buffer. More... | |
table_view | cudf::unpack (packed_columns const &input) |
Deserialize the result of cudf::pack . More... | |
table_view | cudf::unpack (uint8_t const *metadata, uint8_t const *gpu_data) |
Deserialize the result of cudf::pack . More... | |
std::unique_ptr< table > | cudf::reverse (table_view const &source_table, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Reverses the rows within a table. More... | |
std::unique_ptr< column > | cudf::reverse (column_view const &source_column, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Reverses the elements of a column. More... | |
std::unique_ptr< column > | cudf::empty_like (column_view const &input) |
Initializes and returns an empty column of the same type as the input . More... | |
std::unique_ptr< column > | cudf::empty_like (scalar const &input) |
Initializes and returns an empty column of the same type as the input . More... | |
std::unique_ptr< column > | cudf::allocate_like (column_view const &input, mask_allocation_policy mask_alloc=mask_allocation_policy::RETAIN, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Creates an uninitialized new column of the same size and type as the input . More... | |
std::unique_ptr< column > | cudf::allocate_like (column_view const &input, size_type size, mask_allocation_policy mask_alloc=mask_allocation_policy::RETAIN, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Creates an uninitialized new column of the specified size and same type as the input . More... | |
std::unique_ptr< table > | cudf::empty_like (table_view const &input_table) |
Creates a table of empty columns with the same types as the input_table More... | |
void | cudf::copy_range_in_place (column_view const &source, mutable_column_view &target, size_type source_begin, size_type source_end, size_type target_begin, rmm::cuda_stream_view stream=cudf::get_default_stream()) |
Copies a range of elements in-place from one column to another. More... | |
std::unique_ptr< column > | cudf::copy_range (column_view const &source, column_view const &target, size_type source_begin, size_type source_end, size_type target_begin, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Copies a range of elements out-of-place from one column to another. More... | |
std::unique_ptr< column > | cudf::copy_if_else (column_view const &lhs, column_view const &rhs, column_view const &boolean_mask, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Returns a new column, where each element is selected from either lhs or rhs based on the value of the corresponding element in boolean_mask . More... | |
std::unique_ptr< column > | cudf::copy_if_else (scalar const &lhs, column_view const &rhs, column_view const &boolean_mask, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Returns a new column, where each element is selected from either lhs or rhs based on the value of the corresponding element in boolean_mask . More... | |
std::unique_ptr< column > | cudf::copy_if_else (column_view const &lhs, scalar const &rhs, column_view const &boolean_mask, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Returns a new column, where each element is selected from either lhs or rhs based on the value of the corresponding element in boolean_mask . More... | |
std::unique_ptr< column > | cudf::copy_if_else (scalar const &lhs, scalar const &rhs, column_view const &boolean_mask, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Returns a new column, where each element is selected from either lhs or rhs based on the value of the corresponding element in boolean_mask . More... | |
std::unique_ptr< scalar > | cudf::get_element (column_view const &input, size_type index, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Get the element at specified index from a column. More... | |
std::unique_ptr< table > | cudf::sample (table_view const &input, size_type const n, sample_with_replacement replacement=sample_with_replacement::FALSE, int64_t const seed=0, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Gather n samples from given input randomly. More... | |
bool | cudf::has_nonempty_nulls (column_view const &input, rmm::cuda_stream_view stream=cudf::get_default_stream()) |
Checks if a column or its descendants have non-empty null rows. More... | |
bool | cudf::may_have_nonempty_nulls (column_view const &input) |
Approximates if a column or its descendants may have non-empty null elements. More... | |
std::unique_ptr< column > | cudf::purge_nonempty_nulls (column_view const &input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Copy input into output while purging any non-empty null rows in the column or its descendants. More... | |
|
strong |
Indicates when to allocate a mask, based on an existing mask.
Enumerator | |
---|---|
NEVER | Do not allocate a null mask, regardless of input. |
RETAIN | Allocate a null mask if the input contains one. |
ALWAYS | Allocate a null mask, regardless of input. |
Definition at line 214 of file copying.hpp.
|
strong |
Policy to account for possible out-of-bounds indices.
NULLIFY
means to nullify output values corresponding to out-of-bounds gather_map values. DONT_CHECK
means do not check whether the indices are out-of-bounds, for better performance.
Enumerator | |
---|---|
NULLIFY | Output values corresponding to out-of-bounds indices are null. |
DONT_CHECK | No bounds checking is performed, better performance. |
Definition at line 48 of file copying.hpp.
|
strong |
Indicates whether a row can be sampled more than once.
Enumerator | |
---|---|
FALSE | A row can be sampled only once. |
TRUE | A row can be sampled more than once. |
Definition at line 799 of file copying.hpp.
std::unique_ptr<column> cudf::allocate_like | ( | column_view const & | input, |
mask_allocation_policy | mask_alloc = mask_allocation_policy::RETAIN , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Creates an uninitialized new column of the same size and type as the input
.
Supports only fixed-width types.
If the mask_alloc
allocates a validity mask that mask is also uninitialized and the validity bits and the null count should be set by the caller.
input | Immutable view of input column to emulate |
mask_alloc | Optional, Policy for allocating null mask. Defaults to RETAIN |
mr | Device memory resource used to allocate the returned column's device memory |
stream | CUDA stream used for device memory operations and kernel launches |
input
of the same type as input.type()
std::unique_ptr<column> cudf::allocate_like | ( | column_view const & | input, |
size_type | size, | ||
mask_allocation_policy | mask_alloc = mask_allocation_policy::RETAIN , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Creates an uninitialized new column of the specified size and same type as the input
.
Supports only fixed-width types.
If the mask_alloc
allocates a validity mask that mask is also uninitialized and the validity bits and the null count should be set by the caller.
input | Immutable view of input column to emulate |
size | The desired number of elements that the new column should have capacity for |
mask_alloc | Optional, Policy for allocating null mask. Defaults to RETAIN |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
input
of the same type as input.type()
std::unique_ptr<column> cudf::copy_if_else | ( | column_view const & | lhs, |
column_view const & | rhs, | ||
column_view const & | boolean_mask, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Returns a new column, where each element is selected from either lhs
or rhs
based on the value of the corresponding element in boolean_mask
.
Selects each element i in the output column from either rhs
or lhs
using the following rule: output[i] = (boolean_mask.valid(i) and boolean_mask[i]) ? lhs[i] : rhs[i]
cudf::logic_error | if lhs and rhs are not of the same type |
cudf::logic_error | if lhs and rhs are not of the same length |
cudf::logic_error | if boolean mask is not of type bool |
cudf::logic_error | if boolean mask is not of the same length as lhs and rhs |
lhs | left-hand column_view |
rhs | right-hand column_view |
boolean_mask | column of type_id::BOOL8 representing "left (true) / right (false)" boolean for each element. Null element represents false. |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::copy_if_else | ( | column_view const & | lhs, |
scalar const & | rhs, | ||
column_view const & | boolean_mask, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Returns a new column, where each element is selected from either lhs
or rhs
based on the value of the corresponding element in boolean_mask
.
Selects each element i in the output column from either rhs
or lhs
using the following rule: output[i] = (boolean_mask.valid(i) and boolean_mask[i]) ? lhs[i] : rhs
cudf::logic_error | if lhs and rhs are not of the same type |
cudf::logic_error | if boolean mask is not of type bool |
cudf::logic_error | if boolean mask is not of the same length as lhs |
lhs | left-hand column_view |
rhs | right-hand scalar |
boolean_mask | column of type_id::BOOL8 representing "left (true) / right (false)" boolean for each element. Null element represents false. |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::copy_if_else | ( | scalar const & | lhs, |
column_view const & | rhs, | ||
column_view const & | boolean_mask, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Returns a new column, where each element is selected from either lhs
or rhs
based on the value of the corresponding element in boolean_mask
.
Selects each element i in the output column from either rhs
or lhs
using the following rule: output[i] = (boolean_mask.valid(i) and boolean_mask[i]) ? lhs : rhs[i]
cudf::logic_error | if lhs and rhs are not of the same type |
cudf::logic_error | if boolean mask is not of type bool |
cudf::logic_error | if boolean mask is not of the same length as rhs |
lhs | left-hand scalar |
rhs | right-hand column_view |
boolean_mask | column of type_id::BOOL8 representing "left (true) / right (false)" boolean for each element. Null element represents false. |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::copy_if_else | ( | scalar const & | lhs, |
scalar const & | rhs, | ||
column_view const & | boolean_mask, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Returns a new column, where each element is selected from either lhs
or rhs
based on the value of the corresponding element in boolean_mask
.
Selects each element i in the output column from either rhs
or lhs
using the following rule: output[i] = (boolean_mask.valid(i) and boolean_mask[i]) ? lhs : rhs
cudf::logic_error | if boolean mask is not of type bool |
lhs | left-hand scalar |
rhs | right-hand scalar |
boolean_mask | column of type_id::BOOL8 representing "left (true) / right (false)" boolean for each element. null element represents false. |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
std::unique_ptr<column> cudf::copy_range | ( | column_view const & | source, |
column_view const & | target, | ||
size_type | source_begin, | ||
size_type | source_end, | ||
size_type | target_begin, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Copies a range of elements out-of-place from one column to another.
Creates a new column as if an in-place copy was performed into target
. A copy of target
is created first and then the elements indicated by the indices [target_begin
, target_begin
+ N) were copied from the elements indicated by the indices [source_begin
, source_end
) of source
(where N = (source_end
- source_begin
)). Elements outside the range are copied from target
into the returned new column target.
If source
and target
refer to the same elements and the ranges overlap, the behavior is undefined.
cudf::logic_error | for invalid range (if source_begin > source_end , source_begin < 0, source_begin >= source.size() , source_end > source.size() , target_begin < 0, target_begin >= target.size() , or target_begin + (source_end - source_begin ) > target.size() ). |
cudf::logic_error | if target and source have different types. |
source | The column to copy from inside the range |
target | The column to copy from outside the range |
source_begin | The starting index of the source range (inclusive) |
source_end | The index of the last element in the source range (exclusive) |
target_begin | The starting index of the target range (inclusive) |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
void cudf::copy_range_in_place | ( | column_view const & | source, |
mutable_column_view & | target, | ||
size_type | source_begin, | ||
size_type | source_end, | ||
size_type | target_begin, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
) |
Copies a range of elements in-place from one column to another.
Overwrites the range of elements in target
indicated by the indices [target_begin
, target_begin
+ N) with the elements from source
indicated by the indices [source_begin
, source_end
) (where N = (source_end
- source_begin
)). Use the out-of-place copy function returning std::unique_ptr<column> for uses cases requiring memory reallocation. For example for strings columns and other variable-width types.
If source
and target
refer to the same elements and the ranges overlap, the behavior is undefined.
cudf::logic_error | if memory reallocation is required (e.g. for variable width types). |
cudf::logic_error | for invalid range (if source_begin > source_end , source_begin < 0, source_begin >= source.size() , source_end > source.size() , target_begin < 0, target_begin >= target.size() , or target_begin + (source_end - source_begin ) > target.size() ). |
cudf::logic_error | if target and source have different types. |
cudf::logic_error | if source has null values and target is not nullable. |
source | The column to copy from |
target | The preallocated column to copy into |
source_begin | The starting index of the source range (inclusive) |
source_end | The index of the last element in the source range (exclusive) |
target_begin | The starting index of the target range (inclusive) |
stream | CUDA stream used for device memory operations and kernel launches |
std::unique_ptr<column> cudf::empty_like | ( | column_view const & | input | ) |
Initializes and returns an empty column of the same type as the input
.
[in] | input | Immutable view of input column to emulate |
input
Initializes and returns an empty column of the same type as the input
.
[in] | input | Scalar to emulate |
input
std::unique_ptr<table> cudf::empty_like | ( | table_view const & | input_table | ) |
Creates a table of empty columns with the same types as the input_table
Creates the cudf::column
objects, but does not allocate any underlying device memory for the column's data or bitmask.
[in] | input_table | Immutable view of input table to emulate |
input_table
std::unique_ptr<scalar> cudf::get_element | ( | column_view const & | input, |
size_type | index, | ||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Get the element at specified index from a column.
cudf::logic_error | if index is not within the range [0, input.size()) |
input | Column view to get the element from |
index | Index into input to get the element at |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned scalar's device memory |
bool cudf::has_nonempty_nulls | ( | column_view const & | input, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
) |
Checks if a column or its descendants have non-empty null rows.
true
, there exists one or more non-empty null elements.A LIST or STRING column might have non-empty rows that are marked as null. A STRUCT OR LIST column might have child columns that have non-empty null rows. Other types of columns are deemed incapable of having non-empty null rows. E.g. Fixed width columns have no concept of an "empty" row.
input | The column which is (and whose descendants are) to be checked for non-empty null rows. |
stream | CUDA stream used for device memory operations and kernel launches |
bool cudf::may_have_nonempty_nulls | ( | column_view const & | input | ) |
Approximates if a column or its descendants may have non-empty null elements.
true
: Non-empty null elements could existfalse
: Non-empty null elements definitely do not existFalse positives are possible, but false negatives are not.
Compared to the exact has_nonempty_nulls()
function, this function is typically more efficient.
Complexity:
O(count_descendants(input))
O(count_descendants(input)) * m
, where m
is the number of rows in the largest descendantinput | The column which is (and whose descendants are) to be checked for non-empty null rows |
packed_columns cudf::pack | ( | cudf::table_view const & | input, |
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Deep-copy a table_view
into a serialized contiguous memory format.
The metadata from the table_view
is copied into a host vector of bytes and the data from the table_view
is copied into a device_buffer
. Pass the output of this function into cudf::unpack
to deserialize.
input | View of the table to pack |
mr | An optional memory resource to use for all returned device allocations |
std::vector<uint8_t> cudf::pack_metadata | ( | table_view const & | table, |
uint8_t const * | contiguous_buffer, | ||
size_t | buffer_size | ||
) |
Produce the metadata used for packing a table stored in a contiguous buffer.
The metadata from the table_view
is copied into a host vector of bytes which can be used to construct a packed_columns
or packed_table
structure. The caller is responsible for guaranteeing that all of the columns in the table point into contiguous_buffer
.
table | View of the table to pack |
contiguous_buffer | A contiguous buffer of device memory which contains the data referenced by the columns in table |
buffer_size | The size of contiguous_buffer |
unpack
a packed_columns struct std::unique_ptr<column> cudf::purge_nonempty_nulls | ( | column_view const & | input, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Copy input
into output while purging any non-empty null rows in the column or its descendants.
If the input column is not of compound type (LIST/STRING/STRUCT/DICTIONARY), the output will be the same as input.
The purge operation only applies directly to LIST and STRING columns, but it applies indirectly to STRUCT/DICTIONARY columns as well, since these columns may have child columns that are LIST or STRING.
Examples:
input | The column whose null rows are to be checked and purged |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |
input
, but with null rows purged std::unique_ptr<column> cudf::reverse | ( | column_view const & | source_column, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Reverses the elements of a column.
Creates a new column that is the reverse of source_column
. Example:
source_column | Column that will be reversed |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned table's device memory |
std::unique_ptr<table> cudf::reverse | ( | table_view const & | source_table, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Reverses the rows within a table.
Creates a new table that is the reverse of source_table
. Example:
source_table | Table that will be reversed |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned table's device memory |
std::unique_ptr<table> cudf::sample | ( | table_view const & | input, |
size_type const | n, | ||
sample_with_replacement | replacement = sample_with_replacement::FALSE , |
||
int64_t const | seed = 0 , |
||
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Gather n
samples from given input
randomly.
cudf::logic_error | if n > input.num_rows() and replacement == FALSE. |
cudf::logic_error | if n < 0. |
input | View of a table to sample |
n | non-negative number of samples expected from input |
replacement | Allow or disallow sampling of the same row more than once |
seed | Seed value to initiate random number generator |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned table's device memory |
input
table_view cudf::unpack | ( | packed_columns const & | input | ) |
Deserialize the result of cudf::pack
.
Converts the result of a serialized table into a table_view
that points to the data stored in the contiguous device buffer contained in input
.
It is the caller's responsibility to ensure that the table_view
in the output does not outlive the data in the input.
No new device memory is allocated in this function.
input | The packed columns to unpack |
table_view
table_view cudf::unpack | ( | uint8_t const * | metadata, |
uint8_t const * | gpu_data | ||
) |
Deserialize the result of cudf::pack
.
Converts the result of a serialized table into a table_view
that points to the data stored in the contiguous device buffer contained in gpu_data
using the metadata contained in the host buffer metadata
.
It is the caller's responsibility to ensure that the table_view
in the output does not outlive the data in the input.
No new device memory is allocated in this function.
metadata | The host-side metadata buffer resulting from the initial pack() call |
gpu_data | The device-side contiguous buffer storing the data that will be referenced by the resulting table_view |
table_view