multiple headers, unable to clean up

| file | attr1  | attr2 |
|:---- |:------:| -----:|
| ---  | addr1  | gen1  |
| ---  | addr2  | gen2  |
| 1    | 1      | 1     |
| 2    | 3      | 5     |

I have a table similar to this 1, but the table has 3 headers. First header has file, attr1, attr2; second has addr1, gen1.

I want to the final table only has one row of header which is file, addr2, gen2. My code doesn’t work, can anyone help?

df[df.ne(df.columns).any(1)]

Answer

IIUC try overwriting df.columns with level 2 if it exists otherwise use level 0:

df.columns = [c[2] or c[0] for c in df.columns]

Before

df = pd.DataFrame(
    data=[[1, 1, 1], [2, 3, 5]],
    columns=pd.MultiIndex.from_tuples([('file', '', ''),
                                       ('attr1', 'addr1', 'addr2'),
                                       ('attr2', 'gen1', 'gen2')])
)
   file attr1 attr2
       addr1  gen1
       addr2  gen2
0    1     1     1
1    2     3     5
After
    file  addr2  gen2
0     1      1     1
1     2      3     5