Mapping and Reduction#

Coalesced Reduction#

#include <raft/linalg/coalesced_reduction.cuh>

namespace raft::linalg

template<typename InValueType, typename LayoutPolicy, typename OutValueType, typename IdxType, typename MainLambda = raft::identity_op, typename ReduceLambda = raft::add_op, typename FinalLambda = raft::identity_op>
void coalesced_reduction(raft::resources const &handle, raft::device_matrix_view<const InValueType, IdxType, LayoutPolicy> data, raft::device_vector_view<OutValueType, IdxType> dots, OutValueType init, bool inplace = false, MainLambda main_op = raft::identity_op(), ReduceLambda reduce_op = raft::add_op(), FinalLambda final_op = raft::identity_op())#

Compute reduction of the input matrix along the leading dimension This API is to be used when the desired reduction is along the dimension of the memory layout. For example, a row-major matrix will be reduced along the columns whereas a column-major matrix will be reduced along the rows.

Template Parameters:
  • InValueType – the input data-type of underlying raft::matrix_view

  • LayoutPolicy – The layout of Input/Output (row or col major)

  • OutValueType – the output data-type of underlying raft::matrix_view and reduction

  • IndexType – Integer type used to for addressing

  • MainLambda – Unary lambda applied while acculumation (eg: L1 or L2 norm) It must be a ‘callable’ supporting the following input and output:

  • ReduceLambda – Binary lambda applied for reduction (eg: addition(+) for L2 norm) It must be a ‘callable’ supporting the following input and output:

  • FinalLambda – the final lambda applied before STG (eg: Sqrt for L2 norm) It must be a ‘callable’ supporting the following input and output:

Parameters:
  • handle – raft::resources

  • data[in] Input of type raft::device_matrix_view

  • dots[out] Output of type raft::device_matrix_view

  • init[in] initial value to use for the reduction

  • inplace[in] reduction result added inplace or overwrites old values?

  • main_op[in] fused elementwise operation to apply before reduction

  • reduce_op[in] fused binary reduction operation

  • final_op[in] fused elementwise operation to apply before storing results

Map#

#include <raft/linalg/map.cuh>

namespace raft::linalg

template<typename OutType, typename Func, typename ...InTypes, typename = raft::enable_if_output_device_mdspan<OutType>, typename = raft::enable_if_input_device_mdspan<InTypes...>>
void map(const raft::resources &res, OutType out, Func f, InTypes... ins)#

Map a function over zero or more input mdspans of the same size.

The algorithm applied on k inputs can be described in a following pseudo-code:

for (auto i: [0 ... out.size()]) {
  out[i] = f(in_0[i], in_1[i], ..., in_k[i])
}

Performance note: when possible, this function loads the argument arrays and stores the output array using vectorized cuda load/store instructions. The size of the vectorization depends on the size of the largest input/output element type and on the alignment of all pointers.

Usage example:

#include <raft/core/device_mdarray.hpp>
#include <raft/core/resources.hpp>
#include <raft/core/operators.hpp>
#include <raft/linalg/map.cuh>

auto input = raft::make_device_vector<int>(res, n);
... fill input ..
auto squares = raft::make_device_vector<int>(res, n);
raft::linalg::map_offset(res, squares.view(), raft::sq_op{}, input.view());

Template Parameters:
  • OutType – data-type of the result (device_mdspan)

  • Func – the device-lambda performing the actual operation

  • InTypes – data-types of the inputs (device_mdspan)

Parameters:
  • res[in] raft::resources

  • out[out] the output of the map operation (device_mdspan)

  • f[in] device lambda (InTypes::value_type xs…) -> OutType::value_type

  • ins[in] the inputs (each of the same size as the output) (device_mdspan)

template<typename InType1, typename OutType, typename Func, typename = raft::enable_if_output_device_mdspan<OutType>, typename = raft::enable_if_input_device_mdspan<InType1>>
void map(const raft::resources &res, InType1 in1, OutType out, Func f)#

Map a function over one mdspan.

Template Parameters:
  • InType1 – data-type of the input (device_mdspan)

  • OutType – data-type of the result (device_mdspan)

  • Func – the device-lambda performing the actual operation

Parameters:
  • res[in] raft::resources

  • in1[in] the input (the same size as the output) (device_mdspan)

  • out[out] the output of the map operation (device_mdspan)

  • f[in] device lambda (InType1::value_type x) -> OutType::value_type

template<typename InType1, typename InType2, typename OutType, typename Func, typename = raft::enable_if_output_device_mdspan<OutType>, typename = raft::enable_if_input_device_mdspan<InType1, InType2>>
void map(const raft::resources &res, InType1 in1, InType2 in2, OutType out, Func f)#

Map a function over two mdspans.

Template Parameters:
  • InType1 – data-type of the input (device_mdspan)

  • InType2 – data-type of the input (device_mdspan)

  • OutType – data-type of the result (device_mdspan)

  • Func – the device-lambda performing the actual operation

Parameters:
  • res[in] raft::resources

  • in1[in] the input (the same size as the output) (device_mdspan)

  • in2[in] the input (the same size as the output) (device_mdspan)

  • out[out] the output of the map operation (device_mdspan)

  • f[in] device lambda (InType1::value_type x1, InType2::value_type x2) -> OutType::value_type

template<typename InType1, typename InType2, typename InType3, typename OutType, typename Func, typename = raft::enable_if_output_device_mdspan<OutType>, typename = raft::enable_if_input_device_mdspan<InType1, InType2, InType3>>
void map(const raft::resources &res, InType1 in1, InType2 in2, InType3 in3, OutType out, Func f)#

Map a function over three mdspans.

Template Parameters:
  • InType1 – data-type of the input 1 (device_mdspan)

  • InType2 – data-type of the input 2 (device_mdspan)

  • InType3 – data-type of the input 3 (device_mdspan)

  • OutType – data-type of the result (device_mdspan)

  • Func – the device-lambda performing the actual operation

Parameters:
  • res[in] raft::resources

  • in1[in] the input 1 (the same size as the output) (device_mdspan)

  • in2[in] the input 2 (the same size as the output) (device_mdspan)

  • in3[in] the input 3 (the same size as the output) (device_mdspan)

  • out[out] the output of the map operation (device_mdspan)

  • f[in] device lambda (InType1::value_type x1, InType2::value_type x2, InType3::value_type x3) -> OutType::value_type

template<typename OutType, typename Func, typename ...InTypes, typename = raft::enable_if_output_device_mdspan<OutType>, typename = raft::enable_if_input_device_mdspan<InTypes...>>
void map_offset(const raft::resources &res, OutType out, Func f, InTypes... ins)#

Map a function over zero-based flat index (element offset) and zero or more inputs.

The algorithm applied on k inputs can be described in a following pseudo-code:

for (auto i: [0 ... out.size()]) {
  out[i] = f(i, in_0[i], in_1[i], ..., in_k[i])
}

Performance note: when possible, this function loads the argument arrays and stores the output array using vectorized cuda load/store instructions. The size of the vectorization depends on the size of the largest input/output element type and on the alignment of all pointers.

Usage example:

#include <raft/core/device_mdarray.hpp>
#include <raft/core/resources.hpp>
#include <raft/core/operators.hpp>
#include <raft/linalg/map.cuh>

auto squares = raft::make_device_vector<int>(handle, n);
raft::linalg::map_offset(res, squares.view(), raft::sq_op{});

Template Parameters:
  • OutType – data-type of the result (device_mdspan)

  • Func – the device-lambda performing the actual operation

  • InTypes – data-types of the inputs (device_mdspan)

Parameters:
  • res[in] raft::resources

  • out[out] the output of the map operation (device_mdspan)

  • f[in] device lambda (auto offset, InTypes::value_type xs…) -> OutType::value_type

  • ins[in] the inputs (each of the same size as the output) (device_mdspan)

template<typename InType1, typename OutType, typename Func, typename = raft::enable_if_output_device_mdspan<OutType>, typename = raft::enable_if_input_device_mdspan<InType1>>
void map_offset(const raft::resources &res, InType1 in1, OutType out, Func f)#

Map a function over zero-based flat index (element offset) and one mdspan.

Template Parameters:
  • InType1 – data-type of the input (device_mdspan)

  • OutType – data-type of the result (device_mdspan)

  • Func – the device-lambda performing the actual operation

Parameters:
  • res[in] raft::resources

  • in1[in] the input (the same size as the output) (device_mdspan)

  • out[out] the output of the map operation (device_mdspan)

  • f[in] device lambda (auto offset, InType1::value_type x) -> OutType::value_type

template<typename InType1, typename InType2, typename OutType, typename Func, typename = raft::enable_if_output_device_mdspan<OutType>, typename = raft::enable_if_input_device_mdspan<InType1, InType2>>
void map_offset(const raft::resources &res, InType1 in1, InType2 in2, OutType out, Func f)#

Map a function over zero-based flat index (element offset) and two mdspans.

Template Parameters:
  • InType1 – data-type of the input (device_mdspan)

  • InType2 – data-type of the input (device_mdspan)

  • OutType – data-type of the result (device_mdspan)

  • Func – the device-lambda performing the actual operation

Parameters:
  • res[in] raft::resources

  • in1[in] the input (the same size as the output) (device_mdspan)

  • in2[in] the input (the same size as the output) (device_mdspan)

  • out[out] the output of the map operation (device_mdspan)

  • f[in] device lambda (auto offset, InType1::value_type x1, InType2::value_type x2) -> OutType::value_type

template<typename InType1, typename InType2, typename InType3, typename OutType, typename Func, typename = raft::enable_if_output_device_mdspan<OutType>, typename = raft::enable_if_input_device_mdspan<InType1, InType2, InType3>>
void map_offset(const raft::resources &res, InType1 in1, InType2 in2, InType3 in3, OutType out, Func f)#

Map a function over zero-based flat index (element offset) and three mdspans.

Template Parameters:
  • InType1 – data-type of the input 1 (device_mdspan)

  • InType2 – data-type of the input 2 (device_mdspan)

  • InType3 – data-type of the input 3 (device_mdspan)

  • OutType – data-type of the result (device_mdspan)

  • Func – the device-lambda performing the actual operation

Parameters:
  • res[in] raft::resources

  • in1[in] the input 1 (the same size as the output) (device_mdspan)

  • in2[in] the input 2 (the same size as the output) (device_mdspan)

  • in3[in] the input 3 (the same size as the output) (device_mdspan)

  • out[out] the output of the map operation (device_mdspan)

  • f[in] device lambda (auto offset, InType1::value_type x1, InType2::value_type x2, InType3::value_type x3) -> OutType::value_type

Map Reduce#

#include <raft/linalg/map_reduce.cuh>

namespace raft::linalg

template<typename InValueType, typename MapOp, typename ReduceLambda, typename IndexType, typename OutValueType, typename ScalarIdxType, typename ...Args>
void map_reduce(raft::resources const &handle, raft::device_vector_view<const InValueType, IndexType> in, raft::device_scalar_view<OutValueType, ScalarIdxType> out, OutValueType neutral, MapOp map, ReduceLambda op, Args... args)#

CUDA version of map and then generic reduction operation.

Template Parameters:
  • InValueType – the data-type of the input

  • MapOp – the device-lambda performing the actual map operation

  • ReduceLambda – the device-lambda performing the actual reduction

  • IndexType – the index type

  • OutValueType – the data-type of the output

  • ScalarIdxType – index type of scalar

  • Args – additional parameters

Parameters:
  • handle[in] raft::resources

  • in[in] the input of type raft::device_vector_view

  • neutral[in] The neutral element of the reduction operation. For example: 0 for sum, 1 for multiply, +Inf for Min, -Inf for Max

  • out[out] the output reduced value assumed to be a raft::device_scalar_view

  • map[in] the fused device-lambda

  • op[in] the fused reduction device lambda

  • args[in] additional input arrays

Mean Squared Error#

#include <raft/linalg/mean_squared_error.cuh>

namespace raft::linalg

template<typename InValueType, typename IndexType, typename OutValueType>
void mean_squared_error(raft::resources const &handle, raft::device_vector_view<const InValueType, IndexType> A, raft::device_vector_view<const InValueType, IndexType> B, raft::device_scalar_view<OutValueType, IndexType> out, OutValueType weight)#

CUDA version mean squared error function mean((A-B)**2)

Template Parameters:
  • InValueType – Input data-type

  • IndexType – Input/Output index type

  • OutValueType – Output data-type

  • TPB – threads-per-block

Parameters:

Norm#

#include <raft/linalg/norm.cuh>

namespace raft::linalg

template<typename ElementType, typename LayoutPolicy, typename IndexType, typename Lambda = raft::identity_op>
void norm(raft::resources const &handle, raft::device_matrix_view<const ElementType, IndexType, LayoutPolicy> in, raft::device_vector_view<ElementType, IndexType> out, NormType type, Apply apply, Lambda fin_op = raft::identity_op())#

Compute norm of the input matrix and perform fin_op.

Template Parameters:
  • ElementType – Input/Output data type

  • LayoutPolicy – the layout of input (raft::row_major or raft::col_major)

  • IdxType – Integer type used to for addressing

  • Lambda – device final lambda

Parameters:
  • handle[in] raft::resources

  • in[in] the input raft::device_matrix_view

  • out[out] the output raft::device_vector_view

  • type[in] the type of norm to be applied

  • apply[in] Whether to apply the norm along rows (raft::linalg::Apply::ALONG_ROWS) or along columns (raft::linalg::Apply::ALONG_COLUMNS)

  • fin_op[in] the final lambda op

Normalize#

#include <raft/linalg/normalize.cuh>

namespace raft::linalg

template<typename ElementType, typename IndexType, typename MainLambda, typename ReduceLambda, typename FinalLambda>
void row_normalize(raft::resources const &handle, raft::device_matrix_view<const ElementType, IndexType, row_major> in, raft::device_matrix_view<ElementType, IndexType, row_major> out, ElementType init, MainLambda main_op, ReduceLambda reduce_op, FinalLambda fin_op, ElementType eps = ElementType(1e-8))#

Divide rows by their norm defined by main_op, reduce_op and fin_op.

Template Parameters:
  • ElementType – Input/Output data type

  • IndexType – Integer type used to for addressing

  • MainLambda – Type of main_op

  • ReduceLambda – Type of reduce_op

  • FinalLambda – Type of fin_op

Parameters:
  • handle[in] raft::resources

  • in[in] the input raft::device_matrix_view

  • out[out] the output raft::device_matrix_view

  • init[in] Initialization value, i.e identity element for the reduction operation

  • main_op[in] Operation to apply to the elements before reducing them (e.g square for L2)

  • reduce_op[in] Operation to reduce a pair of elements (e.g sum for L2)

  • fin_op[in] Operation to apply once to the reduction result to finalize the norm computation (e.g sqrt for L2)

  • eps[in] If the norm is below eps, the row is considered zero and no division is applied

template<typename ElementType, typename IndexType>
void row_normalize(raft::resources const &handle, raft::device_matrix_view<const ElementType, IndexType, row_major> in, raft::device_matrix_view<ElementType, IndexType, row_major> out, NormType norm_type, ElementType eps = ElementType(1e-8))#

Divide rows by their norm.

Template Parameters:
  • ElementType – Input/Output data type

  • IndexType – Integer type used to for addressing

Parameters:
  • handle[in] raft::resources

  • in[in] the input raft::device_matrix_view

  • out[out] the output raft::device_matrix_view

  • norm_type[in] the type of norm to be applied

  • eps[in] If the norm is below eps, the row is considered zero and no division is applied

Reduction#

#include <raft/linalg/reduce.cuh>

namespace raft::linalg

template<typename InElementType, typename LayoutPolicy, typename OutElementType = InElementType, typename IdxType = std::uint32_t, typename MainLambda = raft::identity_op, typename ReduceLambda = raft::add_op, typename FinalLambda = raft::identity_op>
void reduce(raft::resources const &handle, raft::device_matrix_view<const InElementType, IdxType, LayoutPolicy> data, raft::device_vector_view<OutElementType, IdxType> dots, OutElementType init, Apply apply, bool inplace = false, MainLambda main_op = raft::identity_op(), ReduceLambda reduce_op = raft::add_op(), FinalLambda final_op = raft::identity_op())#

Compute reduction of the input matrix along the requested dimension This API computes a reduction of a matrix whose underlying storage is either row-major or column-major, while allowing the choose the dimension for reduction. Depending upon the dimension chosen for reduction, the memory accesses may be coalesced or strided.

Template Parameters:
  • InElementType – the input data-type of underlying raft::matrix_view

  • LayoutPolicy – The layout of Input/Output (row or col major)

  • OutElementType – the output data-type of underlying raft::matrix_view and reduction

  • IndexType – Integer type used to for addressing

  • MainLambda – Unary lambda applied while acculumation (eg: L1 or L2 norm) It must be a ‘callable’ supporting the following input and output:

  • ReduceLambda – Binary lambda applied for reduction (eg: addition(+) for L2 norm) It must be a ‘callable’ supporting the following input and output:

  • FinalLambda – the final lambda applied before STG (eg: Sqrt for L2 norm) It must be a ‘callable’ supporting the following input and output:

Parameters:
  • handle[in] raft::resources

  • data[in] Input of type raft::device_matrix_view

  • dots[out] Output of type raft::device_matrix_view

  • init[in] initial value to use for the reduction

  • apply[in] whether to reduce along rows or along columns (using raft::linalg::Apply)

  • main_op[in] fused elementwise operation to apply before reduction

  • reduce_op[in] fused binary reduction operation

  • final_op[in] fused elementwise operation to apply before storing results

  • inplace[in] reduction result added inplace or overwrites old values?

Reduce Cols By Key#

#include <raft/linalg/reduce_cols_by_key.cuh>

namespace raft::linalg

template<typename ElementType, typename KeyType = ElementType, typename IndexType = std::uint32_t>
void reduce_cols_by_key(raft::resources const &handle, raft::device_matrix_view<const ElementType, IndexType, raft::row_major> data, raft::device_vector_view<const KeyType, IndexType> keys, raft::device_matrix_view<ElementType, IndexType, raft::row_major> out, IndexType nkeys = 0, bool reset_sums = true)#

Computes the sum-reduction of matrix columns for each given key TODO: Support generic reduction lambdas rapidsai/raft#860.

Template Parameters:
  • ElementType – the input data type (as well as the output reduced matrix)

  • KeyType – data type of the keys

  • IndexType – indexing arithmetic type

Parameters:
  • handle[in] raft::resources

  • data[in] the input data (dim = nrows x ncols). This is assumed to be in row-major layout of type raft::device_matrix_view

  • keys[in] keys raft::device_vector_view (len = ncols). It is assumed that each key in this array is between [0, nkeys). In case this is not true, the caller is expected to have called make_monotonic primitive to prepare such a contiguous and monotonically increasing keys array.

  • out[out] the output reduced raft::device_matrix_view along columns (dim = nrows x nkeys). This will be assumed to be in row-major layout

  • nkeys[in] Number of unique keys in the keys array. By default, inferred from the number of columns of out

  • reset_sums[in] Whether to reset the output sums to zero before reducing

Reduce Rows By Key#

#include <raft/linalg/reduce_rows_by_key.cuh>

namespace raft::linalg

template<typename ElementType, typename KeyType, typename WeightType, typename IndexType>
void reduce_rows_by_key(raft::resources const &handle, raft::device_matrix_view<const ElementType, IndexType, raft::row_major> d_A, raft::device_vector_view<const KeyType, IndexType> d_keys, raft::device_matrix_view<ElementType, IndexType, raft::row_major> d_sums, IndexType n_unique_keys, raft::device_vector_view<char, IndexType> d_keys_char, std::optional<raft::device_vector_view<const WeightType, IndexType>> d_weights = std::nullopt, bool reset_sums = true)#

Computes the weighted sum-reduction of matrix rows for each given key TODO: Support generic reduction lambdas rapidsai/raft#860.

Template Parameters:
  • ElementType – data-type of input and output

  • KeyType – data-type of keys

  • WeightType – data-type of weights

  • IndexType – index type

Parameters:

Strided Reduction#

#include <raft/linalg/strided_reduction.cuh>

namespace raft::linalg

template<typename InValueType, typename LayoutPolicy, typename OutValueType, typename IndexType, typename MainLambda = raft::identity_op, typename ReduceLambda = raft::add_op, typename FinalLambda = raft::identity_op>
void strided_reduction(raft::resources const &handle, raft::device_matrix_view<const InValueType, IndexType, LayoutPolicy> data, raft::device_vector_view<OutValueType, IndexType> dots, OutValueType init, bool inplace = false, MainLambda main_op = raft::identity_op(), ReduceLambda reduce_op = raft::add_op(), FinalLambda final_op = raft::identity_op())#

Compute reduction of the input matrix along the strided dimension This API is to be used when the desired reduction is NOT along the dimension of the memory layout. For example, a row-major matrix will be reduced along the rows whereas a column-major matrix will be reduced along the columns.

Template Parameters:
  • InValueType – the input data-type of underlying raft::matrix_view

  • LayoutPolicy – The layout of Input/Output (row or col major)

  • OutValueType – the output data-type of underlying raft::matrix_view and reduction

  • IndexType – Integer type used to for addressing

  • MainLambda – Unary lambda applied while acculumation (eg: L1 or L2 norm) It must be a ‘callable’ supporting the following input and output:

  • ReduceLambda – Binary lambda applied for reduction (eg: addition(+) for L2 norm) It must be a ‘callable’ supporting the following input and output:

  • FinalLambda – the final lambda applied before STG (eg: Sqrt for L2 norm) It must be a ‘callable’ supporting the following input and output:

Parameters:
  • handle[in] raft::resources

  • data[in] Input of type raft::device_matrix_view

  • dots[out] Output of type raft::device_matrix_view

  • init[in] initial value to use for the reduction

  • main_op[in] fused elementwise operation to apply before reduction

  • reduce_op[in] fused binary reduction operation

  • final_op[in] fused elementwise operation to apply before storing results

  • inplace[in] reduction result added inplace or overwrites old values?