# Merge or join two datasets using between two numbers in python

I have got two data sets;

Trying to merge df2 (content data) on df1 (viewed data). Have to use merge. However, the key is not standard but should be a between key.

```df1 = pd.DataFrame({"ID": [1, 2],"start":[7200, 1000],"end":[7400, 1100],"duration":[200, 100]})

df2 = pd.DataFrame({"Prog_start":[7100,7300,980,1050],"Prog_end":[7300,7400,1050,1150],"Prog":["Prog_1","Prog_2","Prog_3","Prog_4"]})

desired_output=pd.DataFrame({"ID":[1,1,2,2],"start":[7200,7200,1000,1000],"end":[7400,7400,1100,1100],"duration":[200,200,100,100],"Prog_start":[7100,7300,980,1050],"Prog_end":[7300,7400,1050,1150],"Prog":["Prog_1","Prog_2","Prog_3","Prog_4"],"Dur_Prog":[100,100,20,50]})
```

I have tried using iloc and between functions which are not working.

You can take cartesian product of `df1` and `df2`, then filter only the overlapping intervals, and calculate durations:

```# cartesian product and interval filtering
z = (df1
.assign(k=1).merge(df2.assign(k=1), on='k')
.query('(Prog_start < end) & (Prog_end > start)')
.drop(columns='k'))

# duration calculation
z['Duration_Prog'] = (np.clip(z['Prog_end'], z['start'], z['end']) -
np.clip(z['Prog_start'], z['start'], z['end']))

z
```

Output:

```   ID  start   end  duration  Prog_start  Prog_end    Prog  Duration_Prog
0   1   7200  7400       200        7100      7300  Prog_1            100
1   1   7200  7400       200        7300      7400  Prog_2            100
6   2   1000  1100       100         980      1050  Prog_3             50
7   2   1000  1100       100        1050      1150  Prog_4             50
```

P.S. Is there a mistake in row 3 of your `desired_output`? It should be `Duration_Prog = 50` if I understand the logic correctly (the overlap is `1000 - 1050`)

P.P.S. And with the newer `pandas` (1.2.0+) you can use `merge` with `how='cross'` to do cartesian join, no need for the temporary `k` column