How groupby between flag values from different columns with Pandas?

Supposing I have a dataframe like this:

Date is_start is_end
2021-07-16 10:40 False False
2021-07-16 10:41 False False
2021-07-16 10:42 False False
2021-07-16 10:43 True False
2021-07-16 10:44 False False
2021-07-16 10:45 False True
2021-07-16 10:46 False False
2021-07-16 10:47 True False
2021-07-16 10:48 False False
2021-07-16 10:49 False False
2021-07-16 10:50 False False
2021-07-16 10:51 False True
2021-07-16 10:52 False False
2021-07-16 10:53 False False
2021-07-16 10:54 True False
2021-07-16 10:55 False False

I would like to be able to use groupby (or something similar in terms of efficiency) to create groups of rows in between (and including) their respective is_start and is_end flags. E.g.:

Group 1

Date is_start is_end
2021-07-16 10:43 True False
2021-07-16 10:44 False False
2021-07-16 10:45 False True

Group 2

Date is_start is_end
2021-07-16 10:47 True False
2021-07-16 10:48 False False
2021-07-16 10:49 False False
2021-07-16 10:50 False False
2021-07-16 10:51 False True

I am hoping there is a solution to this without going down the route of looping with iterrows, but thus far, it’s eluding me.

Additionally…

(but not as important)

It would also be useful to have a way of separately identifying the unfinished group that starts at 2021-07-16 10:54

Answer

using some cumsum would do the trick. first on is_start that would create a group id

df['gr'] = df['is_start'].cumsum()
print(df)
                Date  is_start  is_end  gr
0   2021-07-16 10:40     False   False   0
1   2021-07-16 10:41     False   False   0
2   2021-07-16 10:42     False   False   0
3   2021-07-16 10:43      True   False   1
4   2021-07-16 10:44     False   False   1
5   2021-07-16 10:45     False    True   1
6   2021-07-16 10:46     False   False   1
7   2021-07-16 10:47      True   False   2
8   2021-07-16 10:48     False   False   2
9   2021-07-16 10:49     False   False   2
10  2021-07-16 10:50     False   False   2
11  2021-07-16 10:51     False    True   2
12  2021-07-16 10:52     False   False   2
13  2021-07-16 10:53     False   False   2
14  2021-07-16 10:54      True   False   3
15  2021-07-16 10:55     False   False   3

Then use again cumsum on is_end, shift it to include the ending row. Substract this to the column gr created before to remove rows after the end. Multiply by itself.

df['gr'] = (df['gr'] - df['is_end'].cumsum().shift(fill_value=0))*df['gr']
print(df)
                Date  is_start  is_end  gr
0   2021-07-16 10:40     False   False   0
1   2021-07-16 10:41     False   False   0
2   2021-07-16 10:42     False   False   0
3   2021-07-16 10:43      True   False   1
4   2021-07-16 10:44     False   False   1
5   2021-07-16 10:45     False    True   1
6   2021-07-16 10:46     False   False   0
7   2021-07-16 10:47      True   False   2
8   2021-07-16 10:48     False   False   2
9   2021-07-16 10:49     False   False   2
10  2021-07-16 10:50     False   False   2
11  2021-07-16 10:51     False    True   2
12  2021-07-16 10:52     False   False   0
13  2021-07-16 10:53     False   False   0
14  2021-07-16 10:54      True   False   3
15  2021-07-16 10:55     False   False   3

now to get the groups and stop the unfinished one, you can do:

for gr, dfg in df.loc[df['gr']>0].groupby('gr'):
    if not dfg['is_end'].any(): print(f'unfinished groupe{gr}', dfg)
    print(f'groupe{gr}', dfg)

Note: this works only if there is no True in is_end before is_start