Distance#
This page provides pylibraft
class references for the publicly-exposed elements of the pylibraft.distance
package. RAFT’s
distances have been highly optimized and support a wide assortment of different distance measures.
- pylibraft.distance.pairwise_distance(X, Y, out=None, metric='euclidean', p=2.0, handle=None)[source]#
Compute pairwise distances between X and Y
- Valid values for metric:
- [“euclidean”, “l2”, “l1”, “cityblock”, “inner_product”,
“chebyshev”, “canberra”, “lp”, “hellinger”, “jensenshannon”, “kl_divergence”, “russellrao”, “minkowski”, “correlation”, “cosine”]
- Parameters:
- XCUDA array interface compliant matrix shape (m, k)
- YCUDA array interface compliant matrix shape (n, k)
- outOptional writable CUDA array interface matrix shape (m, n)
- metricstring denoting the metric type (default=”euclidean”)
- pmetric parameter (currently used only for “minkowski”)
- handleOptional RAFT resource handle for reusing CUDA resources.
If a handle isn’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If a handle is supplied, you will need to explicitly synchronize yourself by calling
handle.sync()
before accessing the output.
- Returns:
- raft.device_ndarray containing pairwise distances
Examples
To compute pairwise distances on cupy arrays:
>>> import cupy as cp >>> from pylibraft.common import Handle >>> from pylibraft.distance import pairwise_distance >>> n_samples = 5000 >>> n_features = 50 >>> in1 = cp.random.random_sample((n_samples, n_features), ... dtype=cp.float32) >>> in2 = cp.random.random_sample((n_samples, n_features), ... dtype=cp.float32)
A single RAFT handle can optionally be reused across pylibraft functions.
>>> handle = Handle() >>> output = pairwise_distance(in1, in2, metric="euclidean", handle=handle)
pylibraft functions are often asynchronous so the handle needs to be explicitly synchronized
>>> handle.sync()
It’s also possible to write to a pre-allocated output array:
>>> import cupy as cp >>> from pylibraft.common import Handle >>> from pylibraft.distance import pairwise_distance >>> n_samples = 5000 >>> n_features = 50 >>> in1 = cp.random.random_sample((n_samples, n_features), ... dtype=cp.float32) >>> in2 = cp.random.random_sample((n_samples, n_features), ... dtype=cp.float32) >>> output = cp.empty((n_samples, n_samples), dtype=cp.float32)
A single RAFT handle can optionally be reused across pylibraft functions.
>>> >>> handle = Handle() >>> pairwise_distance(in1, in2, out=output, ... metric="euclidean", handle=handle) array(...)
pylibraft functions are often asynchronous so the handle needs to be explicitly synchronized
>>> handle.sync()
- pylibraft.distance.fused_l2_nn_argmin(X, Y, out=None, sqrt=True, handle=None)[source]#
Compute the 1-nearest neighbors between X and Y using the L2 distance
- Parameters:
- XCUDA array interface compliant matrix shape (m, k)
- YCUDA array interface compliant matrix shape (n, k)
- outputWritable CUDA array interface matrix shape (m, 1)
- handleOptional RAFT resource handle for reusing CUDA resources.
If a handle isn’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If a handle is supplied, you will need to explicitly synchronize yourself by calling
handle.sync()
before accessing the output.
Examples
To compute the 1-nearest neighbors argmin:
>>> import cupy as cp >>> from pylibraft.common import Handle >>> from pylibraft.distance import fused_l2_nn_argmin >>> n_samples = 5000 >>> n_clusters = 5 >>> n_features = 50 >>> in1 = cp.random.random_sample((n_samples, n_features), ... dtype=cp.float32) >>> in2 = cp.random.random_sample((n_clusters, n_features), ... dtype=cp.float32) >>> # A single RAFT handle can optionally be reused across >>> # pylibraft functions. >>> handle = Handle()
>>> output = fused_l2_nn_argmin(in1, in2, handle=handle)
>>> # pylibraft functions are often asynchronous so the >>> # handle needs to be explicitly synchronized >>> handle.sync()
The output can also be computed in-place on a preallocated array:
>>> import cupy as cp >>> from pylibraft.common import Handle >>> from pylibraft.distance import fused_l2_nn_argmin >>> n_samples = 5000 >>> n_clusters = 5 >>> n_features = 50 >>> in1 = cp.random.random_sample((n_samples, n_features), ... dtype=cp.float32) >>> in2 = cp.random.random_sample((n_clusters, n_features), ... dtype=cp.float32) >>> output = cp.empty((n_samples, 1), dtype=cp.int32) >>> # A single RAFT handle can optionally be reused across >>> # pylibraft functions. >>> handle = Handle()
>>> fused_l2_nn_argmin(in1, in2, out=output, handle=handle) array(...)
>>> # pylibraft functions are often asynchronous so the >>> # handle needs to be explicitly synchronized >>> handle.sync()