Filtering DataFrame by finding exact word (not combined) in a column of strings

My DataFrame has two columns:

Name  Status
 a    I am Good
 b    Goodness!!!
 c    Good is what i feel
 d    Not Good-at-all

I want to filter rows in which Status has a string ‘Good’ as its exact word, not combined with any other words or characters.

So the output will be:

Name  Status
a    i am Good
c    Good is what i feel

Two other rows had a ‘Good’ string in it but mixed with other characters, so should not be picked up.

I tried doing:

d = df[df['Status'].str.contains('Good')]  # But all rows come up

I believe some regex like (r'bGoodb', Status) will do that, but this is not able to sum it up together. And how/where exactly can I fit the regex in a DataFrame filter condition to achieve this? And how to achieve startswith or endswith ‘Good’ (exact word search)?

Answer

If you’re defining “exact” to mean no other characters (including punctuation which defines a word boundary b), you could instead check for a leading and trailing space and/or beginning/end anchors:

>>> df[df['Status'].str.contains(r'(?:s|^)Good(?:s|$)')]
  Name               Status
0    a            I am Good
2    c  Good is what i feel

Explanation:

  • (?:s|^) is a non-capturing group looking for a space character (s) or the beginning of the string (^).

  • Good is the word you’re looking for.

  • (?:s|$) is a non-capturing group looking for a space character (s) or the end of the string ($).

Leave a Reply

Your email address will not be published. Required fields are marked *