Pandas Duplicate() return all duplicates one time except one row

I am trying to get all the Nobel prize winners that won more than once since 1901 – 2016. I tried pandas duplicate() method but it return all the duplicates once except the one row or item. I am getting duplicates based on full_name column in DataFrame. I have tried different combinations of parameters but got the same result. I know I can remove that one row manually, but what is happening wrong here. My code is given as


lucky_winners = df[df.duplicated(['full_name'])]


lucky_winners = df[df.duplicated(['full_name'], keep='first')]


lucky_winners = df[df.duplicated(['full_name'], keep='last')]

Same OutPut:


62                           Marie Curie, née Sklodowska
215    Comité international de la Croix Rouge (Intern...
340                                   Linus Carl Pauling
348    Comité international de la Croix Rouge (Intern...
424                                         John Bardeen
505                                     Frederick Sanger
523    Office of the United Nations High Commissioner...

The duplicated entity is Comité international de la Croix Rouge (International Committee of the Red Cross). I even checked them for Boolean Comparison and get True. Checked it using

lucky_winners.iloc[1].full_name == lucky_winners.iloc[3].full_name

I can’t get that where is the actual problem.


So, what I did to get all the duplicates without repetition is that (Go read the question again first):

  • Got all the duplicates that has more than one occurrences

    lucky_winners = df[df.duplicated(['full_name'])]

  • Then drop the duplicates from this newly created DataFrame

    lucky_winners.drop_duplicates(subset = ['full_name'], inplace=True)

That’s all! In this way I got all the duplicate rows without repetition