Merge specific rows pandas df

I’m currently merging all values in a pandas df row before any 4 letter string. But I’m hoping to apply this specific rows instead of all rows. Specifically, I only want to apply it to rows directly underneath X in Col A. So if it’s X apply function to the row underneath.

d = ({
    'A' : ['X','Foo','No','X','Foo','X','F'],           
    'B' : ['','Bar','Merge','','Barr','','oo'],
    'C' : ['','XXXX','XXXX','','','','B'],
    'D' : ['','','','','','','ar'],
    'E' : ['','','','','','','XXXX'],          
    })

df = pd.DataFrame(data=d)

This code merges all values before any 4 letter string:

mask = (df.iloc[:, 1:].applymap(len) == 4).cumsum(1) == 0
df.A = df.A + df.iloc[:, 1:][mask].fillna('').apply(lambda x: x.sum(), 1)
df.iloc[:, 1:] = df.iloc[:, 1:][~mask].fillna('')

Output:

         A     B     C D     E
0        X                    
1   FooBar        XXXX        
2  NoMerge        XXXX        
3        X                    
4      Foo  Barr              
5        X                    
6   FooBar                XXXX

As you can see this merges the entire Column. I’m trying to apply it to the rows beneath value X in Col A only. I think I need something like

if val in Col.A == 'X':
##Do this to the row directly beneath
mask = (df.iloc[:, 1:].applymap(len) == 4).cumsum(1) == 0
df.A = df.A + df.iloc[:, 1:][mask].fillna('').apply(lambda x: x.sum(), 1)
df.iloc[:, 1:] = df.iloc[:, 1:][~mask].fillna('')

Intended Output:

        A      B     C D     E
0       X                     
1  FooBar         XXXX        
2      No  Merge  XXXX        
3       X                     
4     Foo   Barr              
5       X                     
6  FooBar                 XXXX

Answer

We need to create a mask for row-under-X condition as well. I prepared a series maskX for that and then used this to update the mask you prepared. Net result is the desired output.

d = ({
    'A' : ['X','Foo','No','X','Foo','X','F'],
    'B' : ['','Bar','Merge','','Barr','','oo'],
    'C' : ['','XXXX','XXXX','','','','B'],
    'D' : ['','','','','','','ar'],
    'E' : ['','','','','','','XXXX'],
    })


df = pd.DataFrame(data=d)
print(df)

#Create the mask (as series) to handle the row-under-X condition
maskX = df.iloc[:,0].apply(lambda x: x=='X')

#In the below line use some jugglery to mark the row next to X as True
maskX.index += 1

maskX = pd.concat([pd.Series([False]), maskX])
maskX = maskX.drop(len(maskX)-1)


mask = (df.iloc[:, 1:].applymap(len) == 4).cumsum(1) == 0
#combine the effect of two masks
for i,v in maskX.items():
    mask.iloc[i,:] = mask.iloc[i,:].apply(lambda x: x and v)

df.A[maskX] = df.A + df.iloc[:, 1:][mask].fillna('').apply(lambda x: x.sum(), 1)
df.iloc[:, 1:] = df.iloc[:, 1:][~mask].fillna('')
print(df)

Leave a Reply

Your email address will not be published. Required fields are marked *