# How to find chained sequences of tasks using Pandas? Code Answer

Hello Developer, Hope you guys are doing great. Today at Tutorial Guruji Official website, we are sharing the answer of How to find chained sequences of tasks using Pandas? without wasting too much if your time.

The question is published on by Tutorial Guruji team.

Let’s suppose we have these 3 Dataframes with ‘Start’ & ‘End’ Dates of a certain task.

```Task1 = pd.DataFrame({"Start": pd.date_range("1-jan-2021", periods=10**3, freq="1H")}).assign(
End=lambda d: d.Start + pd.Timedelta(hours=20))

Task2 = pd.DataFrame({"Start": pd.date_range("10-jan-2021", periods=500, freq="3H")}).assign(
End=lambda d: d.Start + pd.Timedelta(hours=24))

Task3 = pd.DataFrame({"Start": pd.date_range("5-jan-2021", periods=700, freq="2H")}).assign(
End=lambda d: d.Start + pd.Timedelta(hours=16))

durations = [20,24,16]
```

I’m looking for a way to link each ‘End Date’ of task[x] with ‘Start Date’ of task[x+1] etc… For the time being, I’m able to find only the sequences where Task_x[‘End’] == Task_x+1[‘Start’] where x is the task number. The following code is what I came up with :

```list_dfs = [Task1,Task2,Task3]
rev_list_dfs = list(reversed(list_dfs))
comparaison_dates = list(rev_list_dfs[0]['Start'])
rev_list_dfs = rev_list_dfs[1:len(rev_list_dfs)]
print(comparaison_dates)

for element in rev_list_dfs:
inception = element.loc[element['End'].isin(comparaison_dates)].reset_index(drop=True)
if inception.empty :
print('No available slots for the chosen interval')
break
else :
comparaison_dates = inception['Start']

miccheck = []
for element in names:
miccheck.append(pd.DataFrame(columns=[f'{element}_Start', f'{element}_End']))

for i in range(0,nbr_of_tasks):
miccheck[i][f'{names[i]}_Start'] = comparaison_dates + pd.to_timedelta(sum(durations[:i]), unit='h')
miccheck[i][f'{names[i]}_End'] = comparaison_dates + pd.to_timedelta(sum(durations[:i+1]), unit='h')

excell = pd.concat(miccheck, axis=1).reset_index(drop=True)
print('n       Start & end date of each sequence       n')
print(excell)
```

How can I find the succession of tasks where there is a downtime between each one? By downtime I mean the amount of waiting hours before we can execute the next task. I’m not specially looking for a made up answer, just pointing me in the right direction would be helpful. Thank you in advance

EDIT:

Desired output :

The key idea is to use merge_asof. For two task dataframes the following would find for each row in `Task1` the row in `Task2` where `End_1` and `Start_2` are closest to each other (but still `End_1 <= Start_2`):

```pd.merge_asof(
Task1.rename(columns=lambda c: c + "_1"),
Task2.rename(columns=lambda c: c + "_2"),
left_on="End_1",
right_on="Start_2",
direction="forward",
)
```

The following function applie this idea to an arbitrary number of task dataframes and also computes the stand-by time:

```def merge_task_dfs(dfs):
dfs = [df.rename(columns=lambda c: c + f"_{i+1}") for i, df in enumerate(dfs)]

output = dfs[0]

for i, df in enumerate(dfs[1:]):
i += 1
output = pd.merge_asof(
output,
df,
left_on=f"End_{i}",
right_on=f"Start_{i+1}",
direction="forward",
).assign(**{f"StandBy_{i}_{i+1}": lambda d: d[f"Start_{i+1}"] - d[f"End_{i}"]})

return output
```

It can be applied to the provided example data as follows:

```Task1 = pd.DataFrame({"Start": pd.date_range("1-jan-2021", periods=10**3, freq="1H")}).assign(
End=lambda d: d.Start + pd.Timedelta(hours=20))

Task2 = pd.DataFrame({"Start": pd.date_range("10-jan-2021", periods=500, freq="3H")}).assign(
End=lambda d: d.Start + pd.Timedelta(hours=24))

Task3 = pd.DataFrame({"Start": pd.date_range("5-jan-2021", periods=700, freq="2H")}).assign(
End=lambda d: d.Start + pd.Timedelta(hours=16))