Remove duplicates and keep top two values based on column

I have dataframe that looks like this:

parent_id  child_id  score
0            98       2.6
1            15       1.8
2            98       2.3
3            98       2.7
4            18       3.2
5            15       1.9
6            18       2.3
7            15       2.0

I want to drop duplicates of column child_id and keep the top two ids based on their score, so I want the final output to be like this:

parent_id  child_id  score
0            98       2.6
3            98       2.7
4            18       3.2
5            15       1.9
6            18       2.3
7            15       2.0

Answer

You can use groupby on your child_id column, and nlargest(2) on your score column:

>>> df.groupby('child_id')['score'].nlargest(2).reset_index().rename({'level_1':'parent_id'},axis=1).set_index('parent_id').sort_index()

Out[281]: 
           child_id  score
parent_id                 
0                98    2.6
3                98    2.7
4                18    3.2
5                15    1.9
6                18    2.3
7                15    2.0