How to apply a function on each group of data in a pandas group by

Suppose the data frame below:

|id |day | order |
|---|--- |-------|
| a | 2  |  6    |
| a | 4  |  0    |
| a | 7  |  4    |
| a | 8  |  8    |
| b | 11 | 10    |
| b | 15 | 15    |

I want to apply a function to day and order column of each group by rows on id column. The function is:

def mean_of_differences(my_list):
   return sum([ my_list[i] - my_list[i-1] for i in range(1, len(my_list))]) / len(my_list)

This function calculates mean of differences of each element and the next one. For example, for id=a, day would be 2+3+1 divided by 4. I know how to use lambda, but didn’t find a way to implement this in a pandas group by. Also, each column should be ordered to get my desired output, so apparently it is not possible to sort by one column before group by The output should be like this:

|id |day| order |
|---|---|-------|
| a |1.5|   2   |
| b | 2 |  2.5  |

Any one know how to do so in a group by?

Answer

First, sort your data by day then group by id and finally compute your diff/mean.

df = df.sort_values('day') 
       .groupby('id') 
       .agg({'day': lambda x: x.diff().fillna(0).mean()}) 
       .reset_index()

Output:

>>> df
  id  day
0  a  1.5
1  b  2.0