How to find chained sequences of tasks using Pandas? Code Answer

Hello Developer, Hope you guys are doing great. Today at Tutorial Guruji Official website, we are sharing the answer of How to find chained sequences of tasks using Pandas? without wasting too much if your time.

The question is published on by Tutorial Guruji team.

Let’s suppose we have these 3 Dataframes with ‘Start’ & ‘End’ Dates of a certain task.

Task1 = pd.DataFrame({"Start": pd.date_range("1-jan-2021", periods=10**3, freq="1H")}).assign(
    End=lambda d: d.Start + pd.Timedelta(hours=20))

Task2 = pd.DataFrame({"Start": pd.date_range("10-jan-2021", periods=500, freq="3H")}).assign(
    End=lambda d: d.Start + pd.Timedelta(hours=24))

Task3 = pd.DataFrame({"Start": pd.date_range("5-jan-2021", periods=700, freq="2H")}).assign(
    End=lambda d: d.Start + pd.Timedelta(hours=16))

durations = [20,24,16]
names = ['Task1', 'Task2', 'Task3']
nbr_of_tasks = 3

I’m looking for a way to link each ‘End Date’ of task[x] with ‘Start Date’ of task[x+1] etc… For the time being, I’m able to find only the sequences where Task_x[‘End’] == Task_x+1[‘Start’] where x is the task number. The following code is what I came up with :

list_dfs = [Task1,Task2,Task3]
rev_list_dfs = list(reversed(list_dfs))
comparaison_dates = list(rev_list_dfs[0]['Start'])
rev_list_dfs = rev_list_dfs[1:len(rev_list_dfs)]
print(comparaison_dates)

for element in rev_list_dfs:     
    inception = element.loc[element['End'].isin(comparaison_dates)].reset_index(drop=True)
    if inception.empty :
        print('No available slots for the chosen interval')        
        break
    else :
        comparaison_dates = inception['Start']

miccheck = []
for element in names:
    miccheck.append(pd.DataFrame(columns=[f'{element}_Start', f'{element}_End'])) 

for i in range(0,nbr_of_tasks):
    miccheck[i][f'{names[i]}_Start'] = comparaison_dates + pd.to_timedelta(sum(durations[:i]), unit='h')
    miccheck[i][f'{names[i]}_End'] = comparaison_dates + pd.to_timedelta(sum(durations[:i+1]), unit='h')
    

excell = pd.concat(miccheck, axis=1).reset_index(drop=True)
print('n       Start & end date of each sequence       n')
print(excell)

How can I find the succession of tasks where there is a downtime between each one? By downtime I mean the amount of waiting hours before we can execute the next task. I’m not specially looking for a made up answer, just pointing me in the right direction would be helpful. Thank you in advance

EDIT:

Desired output : enter image description here

Answer

The key idea is to use merge_asof. For two task dataframes the following would find for each row in Task1 the row in Task2 where End_1 and Start_2 are closest to each other (but still End_1 <= Start_2):

pd.merge_asof(
    Task1.rename(columns=lambda c: c + "_1"),
    Task2.rename(columns=lambda c: c + "_2"), 
    left_on="End_1", 
    right_on="Start_2", 
    direction="forward",
)

The following function applie this idea to an arbitrary number of task dataframes and also computes the stand-by time:

def merge_task_dfs(dfs):
    dfs = [df.rename(columns=lambda c: c + f"_{i+1}") for i, df in enumerate(dfs)]
    
    output = dfs[0]
    
    for i, df in enumerate(dfs[1:]):
        i += 1
        output = pd.merge_asof(
            output,
            df,
            left_on=f"End_{i}",
            right_on=f"Start_{i+1}",
            direction="forward",
        ).assign(**{f"StandBy_{i}_{i+1}": lambda d: d[f"Start_{i+1}"] - d[f"End_{i}"]})
        
    return output

It can be applied to the provided example data as follows:

Task1 = pd.DataFrame({"Start": pd.date_range("1-jan-2021", periods=10**3, freq="1H")}).assign(
    End=lambda d: d.Start + pd.Timedelta(hours=20))

Task2 = pd.DataFrame({"Start": pd.date_range("10-jan-2021", periods=500, freq="3H")}).assign(
    End=lambda d: d.Start + pd.Timedelta(hours=24))

Task3 = pd.DataFrame({"Start": pd.date_range("5-jan-2021", periods=700, freq="2H")}).assign(
    End=lambda d: d.Start + pd.Timedelta(hours=16))

df = merge([Task1, Task2, Task3])

The first 5 rows of the output are: enter image description here

We are here to answer your question about How to find chained sequences of tasks using Pandas? - If you find the proper solution, please don't forgot to share this with your team members.

Related Posts

Tutorial Guruji