An easy way to calculate time intervals between dates in a column in Python

Suppose I have a Pandas DataFrame like this:

 item   event      date 
  A       1     2020-03-09
  B       1     2020-03-09
  A       2     2020-05-01
  B       2     2020-05-01
  C       2     2020-05-01
  A       3     2020-06-25
  C       3     2020-06-25
  B       4     2020-07-18
  C       4     2020-07-18

This dataframe contains a unique date per ‘event’ per ‘item’. So this means that an item has several events with distinct dates.

Now I would like to calculate per item the average amount of days between the dates. So this will be different values for each item and it thus requires me to calculate the average of the time between the dates per event per item.

So the expected output would look like:

  item   average_interval_in_days
    A              54
    B              65.5
    C              39.5 

Anyone an idea how to do this?

Answer

Very similar to @BradSolomon’s answer, with two small differences:

df.sort_values(['item', 'date']).groupby('item')['date'].agg(
    lambda g: g.diff().mean() / pd.Timedelta(days=1))

# gives:
item
A    54.0
B    65.5
C    39.0

Notes:

  1. ensure that dates are sorted within each group, otherwise the mean will depend on the order; in your example, the dates happen to be sorted, so if you can guarantee it, you may skip .sort_values();
  2. use ... / pd.Timedelta(days=1) to produce directly the mean difference in units of days.

Alternative for speed (no sort, no lambda, but a bit more opaque)

gb = df.groupby('item')['date']
(gb.max() - gb.min()) / (gb.count() - 1) / pd.Timedelta(days=1)

# gives:
item
A    54.0
B    65.5
C    39.0