How do I map a column of string values in one dataframe to another column in another dataframe?

I have two dataframes:

Dataframe A contains the following column and values –

Column 1
A; B; C
A; D; E
B; C

Dataframe B contains the following columns and values –

Column 1 Column 2
A Apple
B Banana
C Cat
D Dog
E Egg

Now, I want to map the values in Column 1 of Dataframe 2 to the values in Column 1 of Dataframe 1 to obtain the following column in Dataframe 1:

Column 1 Derived Column
A; B; C Apple; Banana; Cat
A; D; E Apple; Dog; Egg
B; C Banana; Cat

My first thought was to iterate through each row in Dataframe 1, split the value in column 1 by ‘;’, and then map it to Dataframe 2, but I have ~100k rows in Dataframe 1, and ~10k rows in Dataframe 2 which would make this computationally expensive. Is there a much faster way to do this? Thanks!

Answer

Try with series.replace with regex=True after creating a dict from the second df:

df1['Column 2'] = df1['Column 1'].replace(df2.set_index('Column 1')
                                           ['Column 2'],regex=True)

print(df1)

  Column 1            Column 2
0  A; B; C  Apple; Banana; Cat
1  A; D; E     Apple; Dog; Egg
2     B; C         Banana; Cat

Leave a Reply

Your email address will not be published. Required fields are marked *