Suppose the data frame below:

|id |day | order | |---|--- |-------| | a | 2 | 6 | | a | 4 | 0 | | a | 7 | 4 | | a | 8 | 8 | | b | 11 | 10 | | b | 15 | 15 |

I want to apply a function to *day* and *order* column of each group by rows on *id* column.
The function is:

def mean_of_differences(my_list): return sum([ my_list[i] - my_list[i-1] for i in range(1, len(my_list))]) / len(my_list)

This function calculates mean of differences of each element and the next one. For example, for id=a, *day* would be **2+3+1 divided by 4**. I know how to use lambda, but didn’t find a way to implement this in a pandas group by. Also, each column should be ordered to get my desired output, so apparently it is not possible to sort by one column before group by
The output should be like this:

|id |day| order | |---|---|-------| | a |1.5| 2 | | b | 2 | 2.5 |

Any one know how to do so in a group by?

## Answer

First, sort your data by `day`

then group by `id`

and finally compute your diff/mean.

df = df.sort_values('day') .groupby('id') .agg({'day': lambda x: x.diff().fillna(0).mean()}) .reset_index()

Output:

>>> df id day 0 a 1.5 1 b 2.0