Update the value into new column in current dataframe based on another column from different dataframe

I have a column in my dataframe with full of messages and want to categorize them based on the substring present in that message. Well, those substrings that has to be searched in the messages are to be fetched from different dataframe(lets call it master dataframe) and my master dataframe is dynamic and based on the list in master i have to categorize in my main dataframe column

Note : this has to work regardless of uppercase or lowercase letters

df1 table looks like :

           Messages
0         Firewall_Error
1         Firewall_Error_1
2         Firewall_Error_2
3         Firewall_Error_3
4        Wifihealth_1_Info
              ...         
109       Firewall_Error_1
110       Firewall_Error_2
111       Firewall_Error_3
112      Wifihealth_1_Info
113    Wifihealth_2_Failed

Master_df looks like :

    Strings Category
0   error   Error
1   info    Information
2   failed  Warning

So if Master_df[‘Strings’][0] substring is found in Messages column of df1, map that row in df1[category] as Master_df[‘Category’][0] and so on..

Expected output:

df1 must look like :

           Messages           category
0         Firewall_Error      Error
1         Firewall_Error_1    Error
2         Firewall_Error_2    Error
3         Firewall_Error_3    Error
4        Wifihealth_1_Info    Information
              ...         
109       Firewall_Error_1    Error
110       Firewall_Error_2    Error
111       Firewall_Error_3    Error
112      Wifihealth_1_Info    Information
113    Wifihealth_2_Failed    warning

Codes tried :

for i in range(0,len(Master_df['Strings'])):
    df1['Category'] = pd.np.where(df1.Messages.str.contains(Master_df['Strings'][i]), Master_df['Category'][i]))

Answer

Use Series.str.lower for lowercase first, then Series.str.extract by joined Strings converted to index for possible mapping by Series.map to new column:

#if need also convert Strings to lowercases
s = Master_df.set_index('Strings')['Category'].rename(index=str.lower)
pat = f'({"|".join(s.index)})'
df1['Category'] = df1['Messages'].str.lower().str.extract(pat, expand=False).map(s)
print (df1)
                Messages     Category
0         Firewall_Error        Error
1       Firewall_Error_1        Error
2       Firewall_Error_2        Error
3       Firewall_Error_3        Error
4      Wifihealth_1_Info  Information
109     Firewall_Error_1        Error
110     Firewall_Error_2        Error
111     Firewall_Error_3        Error
112    Wifihealth_1_Info  Information
113  Wifihealth_2_Failed      Warning

Leave a Reply

Your email address will not be published. Required fields are marked *