How do you split a row into multiple rows based on delimiter and have them tie back to another column as key:value pairs?

Stack community!

I have a CSV file where 1 row contains messy data coming from another data source. Example:

Packet number Actions
216 [{“ActionNeeded”:”Update things”,”ResponsibleName”:”John Smith”},{“ActionNeeded”:”Design stuff”,”ResponsibleName”:”Jane Smith”}]
217 [{“ActionNeeded”:”Update stuff”,”ResponsibleName”:”Fred Freddington”},{“ActionNeeded”:”Design stuff”,”ResponsibleName”:”Lisa Leslie”}]

Is there a way to split the actions column into rows based on } as the delimiter and separate it by action and name column. But also have the split columns tie back to the Packet number as key:value pairs?

This is what i’d want my CSV file to look like:

Packet Number Actions Responsible Name
216 Update Things John Smith
216 Design stuff Jane Smith
217 Update stuff Fred Freddington
217 Design stuff Lisa Leslie

I’ve tried using df split and str.findall to isolate Actions and Responsible names, but i’m finding it hard to tie the split rows back to the packet number column.

Thanks much!

Answer

It appears that there are some formatting errors in your sample data, but assuming that these are indeed formatting errors, I reckon the following should work.

First, .explode() the DataFrame on the ‘data’ column, resulting in separate lines for each action per ID.

Then, each of these lines contains a dict of actions and users. I’m not sure if it is the computationally optional solution, but you can convert a series of dicts to a new set of columns with .apply(pd.Series), which basically creates a new DataFrame using the dictionary values across the original column.

Finally, concat both data frames back together. I’ve used df, df_msg and df_full below to keep the process easier to follow.

import pandas as pd

# Dummy data, fixed formatting (pardon the long line of code)
df = pd.DataFrame(data={'packet number': [216, 217],
                  'data': [[{"ActionNeeded":"Update things","ResponsibleName":"John Smith"},{"ActionNeeded":"Design stuff","ResponsibleName":"Jane Smith"}], [{"ActionNeeded":"Update stuff","ResponsibleName":"Fred Freddington"},{"ActionNeeded":"Design stuff","ResponsibleName":"Lisa Leslie"}]]})


df = df.explode('data')
df_msg = df['data'].apply(pd.Series)
df_full = pd.concat([df, df_msg], axis=1)