pandas/python: drop duplicates of same strings with different order

is it possible to drop duplicate of rows with the same strings but of different order within the same column?

exampe: dl3_hr_rank.r0 and hr_dl3_rank.r0

code for df before drop:

data = {'item':['dl3_hr_rank.r0','hr_dl3_rank.r0','hr_kl3_rank.r0',
                'kl3_hr_rank.r0','hcrfr_hr_rank.r0',
                'hr_hcrfr_rank.r0','hcfr_hkfr_rank.r0_wp','hkfr_hcfr_rank.r0_wp',
                'hr_krl2_rank.r0_wp','krl2_hr_rank.r0_wp',],
'result':[1.17,1.17,1.17,1.17,1.13,1.13,1,1,1,1]}
df = pd.DataFrame(data)
df

code for df after drop:

data = {'item':['dl3_hr_rank.r0','hr_kl3_rank.r0',
                'hcrfr_hr_rank.r0',
                'hcfr_hkfr_rank.r0_wp',
                'hr_krl2_rank.r0_wp'],
'result':[1.17,1.17,1.13,1,1]}
df = pd.DataFrame(data)
df

ps.i’m having trouble inserting tables with the command.. many thanks, regards

Answer

Try:

df[~df.item.str.split('_').apply(frozenset).duplicated(keep='first')]

Result df: enter image description here

  • Use pandas.Series.str.split to split by ‘_’
  • Use apply(frozenset) to get a hashable set such that I can use duplicated
  • Use pandas.Series.duplicated with keep=’first’ to keep only the first occurrence of duplicate strings