How to compress a pandas dataframe, based on their boolean column value?

Given a pandas frame like this:

In:

pokemon   yes     no     ignore
Vulpix    True    False  False
Nidorino  False   True   False
Growlithe False   False  True
Krokorok  False   True   False
Darumaka  False   False  True
Klefki    False   True   False
Croagunk  True    False   False

What is the correct way of getting as a row value the column associated to the pokemon column?:

Out:

pokemon    Val
Vulpix     yes
Nidorino   no
Growlithe  ignore
Krokorok   no
Darumaka   ignore
Klefki     no
Croagunk   yes

So far I tried with cross tab:

pd.crosstab(df, columns=['yes', 'no', 'ignore'])

However, I am getting a value error:

`ValueError: Shape of passed values`

What is the correct way of getting the previous output?

Answer

If those are one-hot encoded such that there is only ever a single 1/True on each row, set_index and dot with the columns.

df = df.set_index('pokemon')
df = df.dot(df.columns)

pokemon
Vulpix          yes
Nidorino         no
Growlithe    ignore
Krokorok         no
Darumaka     ignore
Klefki           no
Croagunk        yes
dtype: object

The above is a Series, to get the DataFrame to match your output:

df = df.dot(df.columns).to_frame('Val').reset_index()

Leave a Reply

Your email address will not be published. Required fields are marked *