The question is published on by Tutorial Guruji team.
I have a table that looks like this:
ACCOUNT_ID | OPTION 1 A 2 A 2 B 2 B 2 C
I want to count the groups for each so I ran df.groupby(['ACCOUNT_ID'])['OPTION'].value_counts()
and the result looks like this:
ACCOUNT_ID | OPTION 1 A 1 2 A 1 2 B 2 2 C 1
This works well but I want every possible option to be shown (so A, B, C counts for each account_id) like:
ACCOUNT_ID | OPTION 1 A 1 1 B 0 1 C 0 2 A 1 2 B 2 2 C 1
I found this response, using .sort_index().reindex(uniques, fill_value = 0)
, which looks great, but doesn’t work since I am using a MultiIndex.
Any tips would be amazing!!
Answer
One solution is to unstack
the inner level of the MultiIndex into columns. This gives you a DataFrame whose columns have float64
dtype, with NaN
values for missing combinations of ACCOUNT_ID
and OPTION
. Fill NaNs with 0, convert back to integer dtype with astype
, and stack
the columns back into the index to recreate the MultiIndex:
df.unstack().fillna(0).astype(int).stack() ACCOUNT_ID OPTION 1 A 1 B 0 C 0 2 A 1 B 2 C 1 dtype: int64