I have the following Pandas code where I am trying to replace the names of countries with the string
df['title_type2'] = df['title_type'] countries = open(r'countries.txt').read().splitlines() # Reads all lines into a list and removes n. countries = [country.replace(' ', r's') for country in countries] pattern = r'b' + '|'.join(countries) + r'b' df['title_type2'].str.replace(pattern, '<country>')
However I can’t get countries with spaces (like South Korea) to work correctly, since they do not get replaced. The problem seems to be that my
s is turning into
\s. How can I avoid this or how can I fix the issue?
There is no need to replace any space with s.
Your pattern should rather include:
b– “starting” word boundary,
(?:...|...|...)a non-capturing group with country names (alternatives),
b– “ending” word boundary,
pattern = r'b(?:China|South Korea|Taiwan)b'
Then you can do the replacement:
I created test data as follows:
df = pd.DataFrame(['Abc Taiwan', 'Xyz China', 'Zxx South Korea', 'No country name'], columns=['title_type']) df['title_type2'] = df['title_type']
0 Abc <country> 1 Xyz <country> 2 Zxx <country> 3 No country name Name: title_type2, dtype: object