Pandas: remove a level in multi index in one line

Here there is an example:

df = pd.DataFrame(np.random.rand(100).reshape(20, 5))
df.head()

It gives:

          0         1         2         3         4
0  0.424436  0.831037  0.685421  0.170769  0.179134
1  0.984879  0.837583  0.289348  0.403940  0.760511
2  0.216087  0.876270  0.849723  0.020144  0.573268
3  0.558212  0.083600  0.345405  0.492531  0.744830
4  0.708427  0.084600  0.743003  0.459426  0.354911

I apply the following function:

df.groupby([0, 1, 2]).apply(lambda x: pd.DataFrame({
    "3sq": x[3].values**2,
    "4sq": x[4].values**2,
    "3*4": x[3].values*x[4].values,
})).reset_index().head()

It results in:

          0         1         2  level_3       3*4       3sq       4sq
0  0.009899  0.122257  0.159538        0  0.559871  0.501726  0.624755
1  0.105528  0.643789  0.219537        0  0.115762  0.199059  0.067321
2  0.116222  0.196047  0.557748        0  0.773526  0.846430  0.706902
3  0.196865  0.136991  0.457065        0  0.014315  0.064364  0.003184
4  0.216087  0.876270  0.849723        0  0.011548  0.000406  0.328636

How should I remove level_3 inline with the apply function?

I tried to set indexes before I run apply and try to drop level in multi index, but did not find how is it possible to make it in one line.

Answer

Use reset_index with parameter drop=True:

df = df.groupby([0, 1, 2]).apply(lambda x: pd.DataFrame({
    "3sq": x[3].values**2,
    "4sq": x[4].values**2,
    "3*4": x[3].values*x[4].values,
})).reset_index(level=3, drop=True).reset_index()
print (df)
          0         1         2       3*4       3sq       4sq
0  0.216087  0.876270  0.849723  0.011548  0.000406  0.328636
1  0.424436  0.831037  0.685421  0.030591  0.029162  0.032089
2  0.558212  0.083600  0.345405  0.366852  0.242587  0.554772
3  0.708427  0.084600  0.743003  0.163055  0.211072  0.125962
4  0.984879  0.837583  0.289348  0.307201  0.163168  0.578377

Leave a Reply

Your email address will not be published. Required fields are marked *