How to aggregate Pandas DataFrame rows with consistent timedelta between Datetime index values in Python?

I have a Pandas DataFrame with continuous measurements at 2-minute intervals that I have filtered to only include certain values. This process creates subgroups within the DataFrame that have measurement intervals of 2 minutes. I would like to aggregate over each of the subgroups so that I get the mean for each and index the mean values by the last Datetime index of the corresponding group. For example:

Original DataFrame

2020-06-09 08:44:00    1
2020-06-09 08:46:00    2
2020-06-09 08:48:00    3
2020-06-09 08:50:00    4
2020-06-09 09:06:00    10
2020-06-09 09:08:00    12
2020-06-09 09:10:00    14
2020-06-09 10:14:00    20
2020-06-09 10:16:00    10
2020-06-09 10:18:00    5
2020-06-09 10:20:00    2

New DataFrame

2020-06-09 08:50:00    2.5
2020-06-09 09:10:00    12
2020-06-09 10:20:00    9.25

In the original DataFrame, there are three subgroups where the interval between indices remain constant at 2 minutes. The new DataFrame would have only the last index with the mean (or any aggregate) value.

In the past, I have created a separate column with the time difference between datetime indices and, through some unnecessarily complex looping, look for time differences greater than the preferred value and aggregate the previous measurements and add them to a separate dataframe that grows as I loop. I understand that process is incredibly inefficient so I was looking for a faster and more elegant way.

Answer

We can turn the datetime index into a column, take the diff between rows to find the relative time difference between values. Create a boolean mask where values are gt the expected time period and groupby agg based on those values taking the last time value and the mean of the column values. Then restore the index:

# Make the index a Series which is has more computation options
new_df = df.reset_index()
new_df = (
    new_df.groupby(
        # Find where index does not follow pattern of 2 minute intervals
        new_df['index'].diff().gt(pd.Timedelta(minutes=2)).cumsum()
    ).agg({
        # Get the last index value and the average of the column values
        'index': 'last', 'col': 'mean'
    }).set_index('index').rename_axis(index=None)  # restore index
)

new_df:

                       col
2020-06-09 08:50:00   2.50
2020-06-09 09:10:00  12.00
2020-06-09 10:20:00   9.25

Setup:

import pandas as pd

df = pd.DataFrame({
    'col': [1, 2, 3, 4, 10, 12, 14, 20, 10, 5, 2]
}, index=pd.to_datetime(
    ['2020-06-09 08:44:00', '2020-06-09 08:46:00',
     '2020-06-09 08:48:00', '2020-06-09 08:50:00',
     '2020-06-09 09:06:00', '2020-06-09 09:08:00',
     '2020-06-09 09:10:00', '2020-06-09 10:14:00',
     '2020-06-09 10:16:00', '2020-06-09 10:18:00',
     '2020-06-09 10:20:00']
))

df:

                     col
2020-06-09 08:44:00    1
2020-06-09 08:46:00    2
2020-06-09 08:48:00    3
2020-06-09 08:50:00    4
2020-06-09 09:06:00   10
2020-06-09 09:08:00   12
2020-06-09 09:10:00   14
2020-06-09 10:14:00   20
2020-06-09 10:16:00   10
2020-06-09 10:18:00    5
2020-06-09 10:20:00    2