Compare two columns which contain lists of words in a Pandas Dataframe

I am trying to look for the words that are not in common between two pandas columns that contain lists.

The words are not always in the same order and the length of the list can vary.

As an example

column1            column2
['a','b']          ['c','a','b']
['c','a']          ['a','b','d','c']

the result I want is

column3
['c']
['b','d']

Thank you in advance!

Answer

As your target is to look for words that are not in common between the 2 pandas columns, I suppose you also want to find the uncommon elements when column1 element list is a superset of column2 list and vice versa.

Unfortunately, the 2 existing solutions doesn’t handle for this case, e.g.

     column1       column2
0  [c, a, b]        [a, b]
1     [c, a]  [a, b, d, c]

Both the other solutions give result in column3 as:

     column1       column2 column3
0  [c, a, b]        [a, b]      []             <==  empty list [] instead of ['c']
1     [c, a]  [a, b, d, c]  [b, d]

If you want the result above to show ['c'] instead of [] for the first row, you can do it this way:

Use the symmetric_difference() function instead:

df['column3'] = df.apply(lambda x: list(set(x['column1']).symmetric_difference(set(x['column2']))), axis=1)

Result:

print(df)

     column1       column2 column3
0  [c, a, b]        [a, b]     [c]
1     [c, a]  [a, b, d, c]  [b, d]