Merge or join two datasets using between two numbers in python

I have got two data sets;

Trying to merge df2 (content data) on df1 (viewed data). Have to use merge. However, the key is not standard but should be a between key.

df1 = pd.DataFrame({"ID": [1, 2],"start":[7200, 1000],"end":[7400, 1100],"duration":[200, 100]})

df2 = pd.DataFrame({"Prog_start":[7100,7300,980,1050],"Prog_end":[7300,7400,1050,1150],"Prog":["Prog_1","Prog_2","Prog_3","Prog_4"]})

desired_output=pd.DataFrame({"ID":[1,1,2,2],"start":[7200,7200,1000,1000],"end":[7400,7400,1100,1100],"duration":[200,200,100,100],"Prog_start":[7100,7300,980,1050],"Prog_end":[7300,7400,1050,1150],"Prog":["Prog_1","Prog_2","Prog_3","Prog_4"],"Dur_Prog":[100,100,20,50]})

enter image description here

I have tried using iloc and between functions which are not working.

Answer

You can take cartesian product of df1 and df2, then filter only the overlapping intervals, and calculate durations:

# cartesian product and interval filtering
z = (df1
         .assign(k=1).merge(df2.assign(k=1), on='k')
         .query('(Prog_start < end) & (Prog_end > start)')
         .drop(columns='k'))

# duration calculation
z['Duration_Prog'] = (np.clip(z['Prog_end'], z['start'], z['end']) -
                      np.clip(z['Prog_start'], z['start'], z['end']))

z

Output:

   ID  start   end  duration  Prog_start  Prog_end    Prog  Duration_Prog
0   1   7200  7400       200        7100      7300  Prog_1            100
1   1   7200  7400       200        7300      7400  Prog_2            100
6   2   1000  1100       100         980      1050  Prog_3             50
7   2   1000  1100       100        1050      1150  Prog_4             50

P.S. Is there a mistake in row 3 of your desired_output? It should be Duration_Prog = 50 if I understand the logic correctly (the overlap is 1000 - 1050)

P.P.S. And with the newer pandas (1.2.0+) you can use merge with how='cross' to do cartesian join, no need for the temporary k column