Calculate 2 dataframes with related elements but different shape

main_df

ID A B C D E F G
01 1 0 0 0 1 0 0
02 0 0 0 0 0 1 1
03 1 0 1 0 0 0 0
04 1 0 0 1 0 0 0

sub_df

ID B C D E
01 1 0 0 1
02 0 1 1 0
04 1 0 1 0

I want to add sub_df onto main_df, and substitute all values greater than 1 into 1 ( all elements in main_df should only contains 0 and 1s)

The final result should look like this:

ID A B C D E F G
01 1 1 0 0 1 0 0
02 0 0 1 1 0 1 1
03 1 0 1 0 0 0 0
04 1 1 0 1 0 0 0

I’ve tried append(), merge() but the result will only append the dataframe. I will have to write another python function to loop through dataframe to calculate. Is there a better way to complete the task?

Answer

Use pandas.DataFrame.add with fill_value==0.

Set IDs as index if they are not already so:

df1 = df1.set_index("ID")
df2 = df2.set_index("ID")

pandas will fill any hole with fill_value when comparing indices to each other.

new_df = df1.add(df2, fill_value=0)

Then use astype to convert it to either zero or one.

Note that this is bit hacky if you were to have decimals.

print(new_df.astype(bool).astype(int))

Or just plain old comparison without conversion to int:

new_df.mask(new_df.gt(1), 1)

Output:

    A  B  C  D  E  F  G
ID                     
1   1  1  0  0  1  0  0
2   0  0  1  1  0  1  1
3   1  0  1  0  0  0  0
4   1  1  0  1  0  0  0