Drop duplicated rows by multiple columns if they originally or after exchanging position are same in Pandas

Given a small dataset as follows:

  feature1   feature2  pcc
0        a          b  0.6
1        b          a  0.4
2        a          c  0.7
3        a          d -0.1
4        a          d  0.3

I would like to drop duplicated if ['feature1', 'feature2'] are originally same or after exchanging their position are same.

The expected result will be:

  feature1   feature2  pcc
0        a          b  0.6
2        a          c  0.7
3        a          d -0.1

How could I acheive that in Pandas? Thanks.

Answer

Use np.sort for sorting in numpy, asign back and then remove duplicates:

cols = ['feature1', 'feature2']
df[cols] = np.sort(df[cols],axis=1)
df = df.drop_duplicates(cols)
print (df)
  feature1 feature2  pcc
0        a        b  0.6
2        a        c  0.7
3        a        d -0.1