Pandas groupby, return rows of 1 column based on maximum values of other columns

In my data, I need to group by columns X,Y,Z and fill out the result code column. The values will be filled from code column based on max value of either area or new_area column.

So for first group, code C has maximum area. In that case, all rows for that group should be C. For the second group, since the max area is same, so checking the new_area column, the result should be code B.

I need to have these results in a separate column along with other columns as well.

The table in the pic will help clarify.

enter image description here

Answer

This is a simple case of sorting then taking first

df = pd.read_csv(io.StringIO("""X,Y,Z,code,area,new_area,result_code
222 North St,Seattle,WA,A,200,600,C
222 North St,Seattle,WA,B,300,700,C
222 North St,Seattle,WA,C,400,750,C
222 North St,Seattle,WA,D,300,600,C
115 John St,Chicago,IL,A,200,250,B
115 John St,Chicago,IL,B,200,300,B
115 John St,Chicago,IL,C,50,100,B"""))

df = (df.sort_values(["X","Y","Z","area","new_area"], ascending=[True,True,True,False,False])
      .assign(result_code=lambda dfa: dfa.groupby(["X","Y","Z"])["code"].transform("first"))
      .sort_index()
     )

df = (df.sort_values(["X","Y","Z","area","new_area"], ascending=[True,True,True,False,False])
      .assign(result_code=lambda dfa: dfa.groupby(["X","Y","Z"])["code"].transform("first"))
      .sort_index()
     )

output

X Y Z code area new_area result_code
0 222 North St Seattle WA A 200 600 C
1 222 North St Seattle WA B 300 700 C
2 222 North St Seattle WA C 400 750 C
3 222 North St Seattle WA D 300 600 C
4 115 John St Chicago IL A 200 250 B
5 115 John St Chicago IL B 200 300 B
6 115 John St Chicago IL C 50 100 B

Leave a Reply

Your email address will not be published. Required fields are marked *