libcudf  23.12.00
Files | Functions
Searching

Files

file  strings/contains.hpp
 Strings APIs for regex contains, count, matches, like.
 
file  findall.hpp
 

Functions

std::unique_ptr< columncudf::strings::contains_re (strings_column_view const &input, regex_program const &prog, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a boolean column identifying rows which match the given regex_program object. More...
 
std::unique_ptr< columncudf::strings::matches_re (strings_column_view const &input, regex_program const &prog, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a boolean column identifying rows which matching the given regex_program object but only at the beginning the string. More...
 
std::unique_ptr< columncudf::strings::count_re (strings_column_view const &input, regex_program const &prog, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns the number of times the given regex_program's pattern matches in each string. More...
 
std::unique_ptr< columncudf::strings::like (strings_column_view const &input, string_scalar const &pattern, string_scalar const &escape_character=string_scalar(""), rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a boolean column identifying rows which match the given like pattern. More...
 
std::unique_ptr< columncudf::strings::like (strings_column_view const &input, strings_column_view const &patterns, string_scalar const &escape_character=string_scalar(""), rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a boolean column identifying rows which match the corresponding like pattern in the given patterns. More...
 
std::unique_ptr< columncudf::strings::findall (strings_column_view const &input, regex_program const &prog, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a lists column of strings for each matching occurrence using the regex_program pattern within each string. More...
 

Detailed Description

Function Documentation

◆ contains_re()

std::unique_ptr<column> cudf::strings::contains_re ( strings_column_view const &  input,
regex_program const &  prog,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Returns a boolean column identifying rows which match the given regex_program object.

Example:
s = ["abc", "123", "def456"]
p = regex_program::create("\\d+")
r = contains_re(s, p)
r is now [false, true, true]

Any null string entries return corresponding null output column entries.

See the Regex Features page for details on patterns supported by this API.

Parameters
inputStrings instance for this operation
progRegex program instance
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New column of boolean results for each string

◆ count_re()

std::unique_ptr<column> cudf::strings::count_re ( strings_column_view const &  input,
regex_program const &  prog,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Returns the number of times the given regex_program's pattern matches in each string.

Example:
s = ["abc", "123", "def45"]
p = regex_program::create("\\d")
r = count_re(s, p)
r is now [0, 3, 2]

Any null string entries return corresponding null output column entries.

See the Regex Features page for details on patterns supported by this API.

Parameters
inputStrings instance for this operation
progRegex program instance
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New column of match counts for each string

◆ findall()

std::unique_ptr<column> cudf::strings::findall ( strings_column_view const &  input,
regex_program const &  prog,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Returns a lists column of strings for each matching occurrence using the regex_program pattern within each string.

Each output row includes all the substrings within the corresponding input row that match the given pattern. If no matches are found, the output row is empty.

Example:
s = ["bunny", "rabbit", "hare", "dog"]
p = regex_program::create("[ab]")
r = findall(s, p)
r is now a lists column like:
[ ["b"]
["a","b","b"]
["a"]
[] ]

A null output row occurs if the corresponding input row is null.

See the Regex Features page for details on patterns supported by this API.

Parameters
inputStrings instance for this operation
progRegex program instance
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New lists column of strings

◆ like() [1/2]

std::unique_ptr<column> cudf::strings::like ( strings_column_view const &  input,
string_scalar const &  pattern,
string_scalar const &  escape_character = string_scalar(""),
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Returns a boolean column identifying rows which match the given like pattern.

The like pattern expects only 2 wildcard special characters:

  • % zero or more of any character
  • _ any single character
Example:
s = ["azaa", "ababaabba", "aaxa"]
r = like(s, "%a_aa%")
r is now [1, 1, 0]
r = like(s, "a__a")
r is now [1, 0, 1]

Specify an escape character to include either % or _ in the search. The escape_character is expected to be either 0 or 1 characters. If more than one character is specified only the first character is used.

Example:
s = ["abc_def", "abc1def", "abc_"]
r = like(s, "abc/_d%", "/")
r is now [1, 0, 0]

Any null string entries return corresponding null output column entries.

Exceptions
cudf::logic_errorif pattern or escape_character is invalid
Parameters
inputStrings instance for this operation
patternLike pattern to match within each string
escape_characterOptional character specifies the escape prefix. Default is no escape character.
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New boolean column

◆ like() [2/2]

std::unique_ptr<column> cudf::strings::like ( strings_column_view const &  input,
strings_column_view const &  patterns,
string_scalar const &  escape_character = string_scalar(""),
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Returns a boolean column identifying rows which match the corresponding like pattern in the given patterns.

The like pattern expects only 2 wildcard special characters:

  • % zero or more of any character
  • _ any single character
Example:
s = ["azaa", "ababaabba", "aaxa"]
p = ["%a", "b%", "__x_"]
r = like(s, p)
r is now [1, 0, 1]

Specify an escape character to include either % or _ in the search. The escape_character is expected to be either 0 or 1 characters. If more than one character is specified only the first character is used. The escape character is applied to all patterns.

Any null string entries return corresponding null output column entries.

Exceptions
cudf::logic_errorif patterns contains nulls or escape_character is invalid
cudf::logic_errorif patterns.size() != input.size()
Parameters
inputStrings instance for this operation
patternsLike patterns to match within each corresponding string
escape_characterOptional character specifies the escape prefix. Default is no escape character.
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New boolean column

◆ matches_re()

std::unique_ptr<column> cudf::strings::matches_re ( strings_column_view const &  input,
regex_program const &  prog,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Returns a boolean column identifying rows which matching the given regex_program object but only at the beginning the string.

Example:
s = ["abc", "123", "def456"]
p = regex_program::create("\\d+")
r = matches_re(s, p)
r is now [false, true, false]

Any null string entries return corresponding null output column entries.

See the Regex Features page for details on patterns supported by this API.

Parameters
inputStrings instance for this operation
progRegex program instance
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New column of boolean results for each string