grouping columns in python

hello i am working with a dataset in .csv with python and am having errors grouping the columns. the code i’m working with is:

import pandas as pd 
df=pd.read_excel('filepath')
df['Items'].str.split(',', expand=True)
df=df.groupby(['Items0', 'Items1','Items2', 'Items3', 'Items4', 'Items5' ]).size()
print(df)

when i run the print(df) i get values such as Items0-1, Items1-1, Items2-1 and so on

this is the sample data i am working with and how i am trying to organize it is below.

can someone direct me to how to solve this?

sample data:

Name Date Items
johnny smith 09/1/2021 bread, oranges, peanut butter, apples, celery, peanuts
granny smith 08/31/2021 oranges, peanut butter, apples, bread
jane doe 09/01/2021 oranges, apples, celery, peanut butter
jack frost 08/01/2021 bread, oranges, apples
cinderella 08/16/2021 apples, peanuts, bread

what i am attempting to achieve:

Name Date Items0 Items1 Items2 Items3 Items4 Items5
johnny smith 09/1/2021 bread oranges peanut butter apples celery peanuts
granny smith 08/31/2021 bread oranges peanut butter apples
jane doe 09/01/2021 oranges peanut butter apples
jack frost 08/01/2021 bread oranges apples
cinderella 08/16/2021 bread apples peanuts

Answer

One approach would be to derive categorical values from the string in Items then reformat the DataFrame and join back together:

x = df.pop('Items').str.get_dummies(', ')
df = df.join(
    x.mul(x.columns).set_axis(
        range(len(x.columns)), axis=1
    ).add_prefix('Item')
)
           Name        Date   Item0  Item1   Item2    Item3          Item4    Item5
0  johnny smith   09/1/2021  apples  bread  celery  oranges  peanut butter  peanuts
1  granny smith  08/31/2021  apples  bread          oranges  peanut butter         
2      jane doe  09/01/2021  apples         celery  oranges  peanut butter         
3    jack frost  08/01/2021  apples  bread          oranges                        
4    cinderella  08/16/2021  apples  bread                                  peanuts

Explanations: use str.get_dummies to convert the Item column to categorical values:

x = df.pop('Items').str.get_dummies(', ')
   apples  bread  celery  oranges  peanut butter  peanuts
0       1      1       1        1              1        1
1       1      1       0        1              1        0
2       1      0       1        1              1        0
3       1      1       0        1              0        0
4       1      1       0        0              0        1

This can be converted into the desired format by mul by the column names (replacing the 1 values with the column name), set_axis to enumerate columns (0-number of items) and add_prefix of ‘Item’ to the newly numbered columns:

x.mul(x.columns).set_axis(
    range(len(x.columns)), axis=1
).add_prefix('Item')

x:

    Item0  Item1   Item2    Item3          Item4    Item5
0  apples  bread  celery  oranges  peanut butter  peanuts
1  apples  bread          oranges  peanut butter         
2  apples         celery  oranges  peanut butter         
3  apples  bread          oranges                        
4  apples  bread                                  peanuts

Then join back to the remaining columns in df .