I have a large dataframe which has a similar pattern as below:

X Y Z 0 a p 2 1 a q 5 2 a r 6 3 a s 3 4 b w 10 5 b z 20 6 b y 9 7 b x 20

And can be constructed as:

df = { 'X': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'], 'Y': ['p', 'q', 'r', 's', 'w', 'x', 'y', 'z'], 'Z': [2, 5, 6, 3, 10, 20, 9, 5] }

Now I want to group this dataframe by the first column i.e., `X`

and take `max`

from the `Z`

column and its corresponding value from `Y`

. And if there are two max values in `Z`

, then I would like to take alphabetically first value from `Y`

.

So my expected result would look like:

X Y Z a r 6 b x 20

I have tried `groupby('X', as_index=False).agg({'Z': 'max', 'Y': 'first'})`

but this selects max from `Z`

and first from `Y`

both at the same time.

Additionally I know there is a `pd.series.groupby.nlargest(1)`

approach, but this would take a lot of time for my dataset.

Any suggestions on how could I proceed would be appreciated.

Thanks in advance:)

## Answer

Let us try `sort_values`

+ `drop_duplicates`

:

df.sort_values(['X', 'Z', 'Y'], ascending=[True, False, True]).drop_duplicates('X')

X Y Z 2 a r 6 5 b x 20