How to apply filter condition during Pandas Merge operation but retain them in the result DF?

I’m trying to do something with pandas equivalent to the following SQL command:

select *
from tableA as A
left join tableB as B
on A.key = B.key and substr(A.value,1,2) not in ('something', 'something else')

As you can see in the sql query above, A.value have a filter condition applied during the join, but since it’s a left join, the on condition not in criteria does not persist to the result dataset, it only applies during the join operation, such that the join operation will not consider any rows fulfilling the A.value not in (‘something’, ‘something else’) criteria.

I tried the following python code, but it results in filtering out the A.value rows from the result dataset, I need all the rows via the left join, but i don’t want the merge operation on the rows as specified by the a.value not in condition, just like the SQL statement logic above.

df_output = df_input1[~df_input1['value'].str[:2].isin(['something', 'something_else'])].merge(df_input2, left_on = 'key', right_on = 'key', how='left')

Thank you all in advance!

Answer

A bit hacky, but the idea is to create a new column for joining in df1 and to only fill in this column for rows that you want to merge with:

import pandas as pd

df_input1 = pd.DataFrame({
    'fruit':['apple','banana','mango'],
    'quantity':[3,4,5],
})

df_input2 = pd.DataFrame({
    'fruit':['apple','banana','mango'],
    'in_stock':[True,False,True],
})

minimum_quantity = 4
ind = df_input1['quantity'].ge(minimum_quantity)
df_input1.loc[ind, 'merge_column'] = df_input1.loc[ind, 'fruit']

merge_filtered_fruits = df_input1.merge(
    df_input2,
    left_on = 'merge_column',
    right_on = 'fruit',
    how = 'left'
).drop(columns=['merge_column'])

print(merge_filtered_fruits)

Output:

  fruit_x  quantity fruit_y in_stock
0   apple         3     NaN      NaN
1  banana         4  banana    False
2   mango         5   mango     True

I’m curious what a better method could be