cudf.core.column.string.StringMethods.contains#

StringMethods.contains(pat: Union[str, Sequence], case: bool = True, flags: int = 0, na=nan, regex: bool = True) SeriesOrIndex#

Test if pattern or regex is contained within a string of a Series or Index.

Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.

Parameters
patstr or list-like

Character sequence or regular expression. If pat is list-like then regular expressions are not accepted.

flagsint, default 0 (no flags)

Flags to pass through to the regex engine (e.g. re.MULTILINE)

regexbool, default True

If True, assumes the pattern is a regular expression. If False, treats the pattern as a literal string.

Returns
Series/Index of bool dtype

A Series/Index of boolean dtype indicating whether the given pattern is contained within the string of each element of the Series/Index.

Notes

The parameters case and na are not yet supported and will raise a NotImplementedError if anything other than the default value is set. The flags parameter currently only supports re.DOTALL and re.MULTILINE.

Examples

>>> import cudf
>>> s1 = cudf.Series(['Mouse', 'dog', 'house and parrot', '23', None])
>>> s1
0               Mouse
1                 dog
2    house and parrot
3                  23
4                <NA>
dtype: object
>>> s1.str.contains('og', regex=False)
0    False
1     True
2    False
3    False
4     <NA>
dtype: bool

Returning an Index of booleans using only a literal pattern.

>>> data = ['Mouse', 'dog', 'house and parrot', '23.0', np.NaN]
>>> idx = cudf.Index(data)
>>> idx
StringIndex(['Mouse' 'dog' 'house and parrot' '23.0' None], dtype='object')
>>> idx.str.contains('23', regex=False)
GenericIndex([False, False, False, True, <NA>], dtype='bool')

Returning ‘house’ or ‘dog’ when either expression occurs in a string.

>>> s1.str.contains('house|dog', regex=True)
0    False
1     True
2     True
3    False
4     <NA>
dtype: bool

Returning any digit using regular expression.

>>> s1.str.contains('\d', regex=True)
0    False
1    False
2    False
3     True
4     <NA>
dtype: bool

Ensure pat is a not a literal pattern when regex is set to True. Note in the following example one might expect only s2[1] and s2[3] to return True. However, ‘.0’ as a regex matches any character followed by a 0.

>>> s2 = cudf.Series(['40', '40.0', '41', '41.0', '35'])
>>> s2.str.contains('.0', regex=True)
0     True
1     True
2    False
3     True
4    False
dtype: bool

The pat may also be a list of strings in which case the individual strings are searched in corresponding rows.

>>> s2 = cudf.Series(['house', 'dog', 'and', '', ''])
>>> s1.str.contains(s2)
0    False
1     True
2     True
3     True
4     <NA>
dtype: bool