Count duplicates in a question in Python with Pandas

I have a database, and I am using Python. A unique number per user, and a user can receive a question multiple times. What I would like to calculate is the number of times a user received a question, and the sum of the correct answers they had with that question.

Here is an example from my database:

Question user.id correct answer

What weird food combinations do you really enjoy?   1   0
What social stigma does society need to get over?   2   1
What’s something you really resent paying for?  1   1
What would a world populated by clones of you be like?  1   0
Do you think that aliens exist? 1   1
What weird food combinations do you really enjoy?   1   1
Do you think that aliens exist? 1   0
What social stigma does society need to get over?   1   0
What’s something you really resent paying for?  2   1
What would a world populated by clones of you be like?  3   1
What weird food combinations do you really enjoy?   2   1
What would a world populated by clones of you be like?  2   0
What weird food combinations do you really enjoy?   2   0
Do you think that aliens exist? 3   1
What’s something you really resent paying for?  3   1

Here is the result I want:

Question    user.id correct answer  redundancy
What weird food combinations do you really enjoy?   1   1   2
What social stigma does society need to get over?   1   1   2
What’s something you really resent paying for?  1   1   
What would a world populated by clones of you be like?  1   0   
Do you think that aliens exist? 1   1   
What weird food combinations do you really enjoy?   2   1   2
What social stigma does society need to get over?   2   0   0
What’s something you really resent paying for?  2   1   
What would a world populated by clones of you be like?  2   0   
Do you think that aliens exist? 2   1   
What weird food combinations do you really enjoy?   3   0   0
What social stigma does society need to get over?   3   0   0
What’s something you really resent paying for?  3   1   
What would a world populated by clones of you be like?  3   0   
Do you think that aliens exist? 3   1   

Answer

IIUC this is what you want:

>>> df.groupby(["Question","user.id"]).agg(["sum","count"]).reset_index()
                                             Question user.id correct answer      
                                                                         sum count
0                     Do you think that aliens exist?       1              1     2
1                     Do you think that aliens exist?       3              1     1
2   What social stigma does society need to get over?       1              0     1
3   What social stigma does society need to get over?       2              1     1
4   What weird food combinations do you really enjoy?       1              1     2
5   What weird food combinations do you really enjoy?       2              1     2
6   What would a world populated by clones of you ...       1              0     1
7   What would a world populated by clones of you ...       2              0     1
8   What would a world populated by clones of you ...       3              1     1
9      What’s something you really resent paying for?       1              1     1
10     What’s something you really resent paying for?       2              1     1
11     What’s something you really resent paying for?       3              1     1

Leave a Reply

Your email address will not be published. Required fields are marked *