libcudf  23.12.00
Functions
cudf::hashing Namespace Reference

Hash APIs. More...

Functions

std::unique_ptr< columnmurmurhash3_x86_32 (table_view const &input, uint32_t seed=DEFAULT_HASH_SEED, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Computes the MurmurHash3 32-bit hash value of each row in the given table. More...
 
std::unique_ptr< tablemurmurhash3_x64_128 (table_view const &input, uint64_t seed=DEFAULT_HASH_SEED, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Computes the MurmurHash3 64-bit hash value of each row in the given table. More...
 
std::unique_ptr< columnspark_murmurhash3_x86_32 (table_view const &input, uint32_t seed=DEFAULT_HASH_SEED, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Computes the MurmurHash3 32-bit hash value of each row in the given table. More...
 
std::unique_ptr< columnmd5 (table_view const &input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Computes the MD5 hash value of each row in the given table. More...
 
std::unique_ptr< columnxxhash_64 (table_view const &input, uint64_t seed=DEFAULT_HASH_SEED, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Computes the XXHash_64 hash value of each row in the given table. More...
 

Detailed Description

Hash APIs.

Function Documentation

◆ md5()

std::unique_ptr<column> cudf::hashing::md5 ( table_view const &  input,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Computes the MD5 hash value of each row in the given table.

Parameters
inputThe table of columns to hash
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
A column where each row is the hash of a row from the input

◆ murmurhash3_x64_128()

std::unique_ptr<table> cudf::hashing::murmurhash3_x64_128 ( table_view const &  input,
uint64_t  seed = DEFAULT_HASH_SEED,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Computes the MurmurHash3 64-bit hash value of each row in the given table.

This function takes a 64-bit seed value and returns hash values using the MurmurHash3_x64_128 algorithm. The hash produces in two uint64 values per row.

Parameters
inputThe table of columns to hash
seedOptional seed value to use for the hash function
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
A table of two UINT64 columns

◆ murmurhash3_x86_32()

std::unique_ptr<column> cudf::hashing::murmurhash3_x86_32 ( table_view const &  input,
uint32_t  seed = DEFAULT_HASH_SEED,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Computes the MurmurHash3 32-bit hash value of each row in the given table.

This function computes the hash of each column using the seed for the first column and the resulting hash as a seed for the next column and so on. The result is a uint32 value for each row.

Parameters
inputThe table of columns to hash
seedOptional seed value to use for the hash function
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
A column where each row is the hash of a row from the input

◆ spark_murmurhash3_x86_32()

std::unique_ptr<column> cudf::hashing::spark_murmurhash3_x86_32 ( table_view const &  input,
uint32_t  seed = DEFAULT_HASH_SEED,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Computes the MurmurHash3 32-bit hash value of each row in the given table.

This function computes the hash similar to MurmurHash3_x86_32 with special processing to match Spark's implementation results.

Parameters
inputThe table of columns to hash
seedOptional seed value to use for the hash function
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
A column where each row is the hash of a row from the input

◆ xxhash_64()

std::unique_ptr<column> cudf::hashing::xxhash_64 ( table_view const &  input,
uint64_t  seed = DEFAULT_HASH_SEED,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Computes the XXHash_64 hash value of each row in the given table.

This function takes a 64-bit seed value and returns a column of type UINT64.

Parameters
inputThe table of columns to hash
seedOptional seed value to use for the hash function
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
A column where each row is the hash of a row from the input