databricks.koalas.Series.str.contains

str.contains(pat, case=True, flags=0, na=None, regex=True) → ks.Series

Test if pattern or regex is contained within a string of a Series.

Return boolean Series based on whether a given pattern or regex is contained within a string of a Series.

Analogous to match(), but less strict, relying on re.search() instead of re.match().

Parameters
patstr

Character sequence or regular expression.

casebool, default True

If True, case sensitive.

flagsint, default 0 (no flags)

Flags to pass through to the re module, e.g. re.IGNORECASE.

nadefault None

Fill value for missing values. NaN converted to None.

regexbool, default True

If True, assumes the pat is a regular expression. If False, treats the pat as a literal string.

Returns
Series of boolean values or object

A Series of boolean values indicating whether the given pattern is contained within the string of each element of the Series.

Examples

Returning a Series of booleans using only a literal pattern.

>>> s1 = ks.Series(['Mouse', 'dog', 'house and parrot', '23', np.NaN])
>>> s1.str.contains('og', regex=False)
0    False
1     True
2    False
3    False
4     None
Name: 0, dtype: object

Specifying case sensitivity using case.

>>> s1.str.contains('oG', case=True, regex=True)
0    False
1    False
2    False
3    False
4     None
Name: 0, dtype: object

Specifying na to be False instead of NaN replaces NaN values with False. If Series does not contain NaN values the resultant dtype will be bool, otherwise, an object dtype.

>>> s1.str.contains('og', na=False, regex=True)
0    False
1     True
2    False
3    False
4    False
Name: 0, dtype: bool

Returning ‘house’ or ‘dog’ when either expression occurs in a string.

>>> s1.str.contains('house|dog', regex=True)
0    False
1     True
2     True
3    False
4     None
Name: 0, dtype: object

Ignoring case sensitivity using flags with regex.

>>> import re
>>> s1.str.contains('PARROT', flags=re.IGNORECASE, regex=True)
0    False
1    False
2     True
3    False
4     None
Name: 0, dtype: object

Returning any digit using regular expression.

>>> s1.str.contains('[0-9]', regex=True)
0    False
1    False
2    False
3     True
4     None
Name: 0, dtype: object

Ensure pat is a not a literal pattern when regex is set to True. Note in the following example one might expect only s2[1] and s2[3] to return True. However, ‘.0’ as a regex matches any character followed by a 0.

>>> s2 = ks.Series(['40','40.0','41','41.0','35'])
>>> s2.str.contains('.0', regex=True)
0     True
1     True
2    False
3     True
4    False
Name: 0, dtype: bool