How to split comma separated strings in a column into different columns if they’re not of same length using python or pandas in jupyter notebook

I am learning python and working on a sample Kaggle dataset and trying to split comma-separated values in a column into different columns using python or pandas in jupyter notebook.

For instance :

column_A

Garbage: Tissues, Organics: Milk, Recycle: Cardboards

Garbage: Paper Towels, Organics: Eggs, Recycle: Glass, Junk: Feces

Garbage: cups, Recycle: Plastic bottles

I want to split these into different columns based on commas, like below:

Garbage Organics Recycle Junk
Tissues Milk Cardboards Null
Paper Towels Eggs Glass Feces
Cups Null Plastic bottles Null

I’ve tried using Lambda functions but it only worked if there is same length of comma separated strings but not for unequal length and displaying an index error “list index out of range”. The code I’ve used is below:

list_of_dicts = [{x1.split(':')[0].strip():x1.split(':')[1].strip() for x1 in x.split(',')} for x in Df1['column_name']]
Df2=pd.DataFrame.from_dict(list_of_dicts)

Any help is greatly appreciated. Thanks

Answer

We can use a regular expression pattern to find all the matching key-value pairs from each row of column_A , then map the list of pairs from each row to dictionary in order to create records then construct a dataframe from these records

pd.DataFrame(map(dict, df['column_A'].str.findall(r's*([^:,]+):s*([^,]+)')))

See the online regex demo

        Garbage Organics          Recycle   Junk
0       Tissues     Milk       Cardboards    NaN
1  Paper Towels     Eggs            Glass  Feces
2          cups      NaN  Plastic bottles    NaN

Here is an alternate approach in case you don’t want to use regular expression patterns

df['column_A'].str.split(', ').explode()
              .str.split(': ', expand=True)
              .set_index(0, append=True)[1].unstack()