I am working on fuzzy matching two dataframes using fuzzywuzzy. I set a cutoff score of 75, using process.extractOne to get the highest match.
Whenever a match is not made the value for that row is ‘None’.
How do I replace ‘None’ with the most common name?
from fuzzywuzzy import process df1['Matched_Nickname_and_Score'] = df1['FNAME'].apply(lambda x: process.extractOne(x, df2['NICKNAME'].tolist(), score_cutoff = 75))
I have a way of finding the max value for each row, but not sure where to go from here
maxValuesObj = df1.max(axis = 1)
Here is something that might help:
df1['Matched_Nickname_and_Score'] = df1['Matched_Nickname_and_Score'].fillna(value=df1.FNAME.mode().values)
df1.FNAME.mode().values will get the most common name from the column FNAME of the df1 dataframe. You just need to use fillna with that value and you’ll get what you are looking for.