I have a dataframe like that,
values = [ [ 1, 1, 'Standard Room', 'adult', 'John Doe' ], [ 1, 1, 'Standard Room', 'adult', 'John Noakes' ], [ 1, 2, 'Standard Room', 'adult', 'Richard Miles' ], [ 2, 1, 'Standard Room', 'adult', 'John Smith' ], [ 2, 1, 'Standard Room', 'adult', 'Joe Public' ] ] columns = [ 'booking_id', 'room_id', 'room_name', 'guest_type', 'guest_name' ] df = pd.DataFrame.from_records(values, columns=columns) print(df) booking_id room_id room_name guest_type guest_name 0 1 1 Standard Room adult John Doe 1 1 1 Standard Room adult John Noakes 2 1 2 Standard Room adult Richard Miles 3 2 1 Standard Room adult John Smith 4 2 1 Standard Room adult Joe Public
I just want to create a json for each booking_id
. And that json should contains room_id
, guest_type
and guest_name
fields. I’ve been try something like that,
df.groupby(['booking_id', 'room_id', 'room_name']).apply(lambda x: x.to_dict(orient='records'))
But in this way, there are some duplicate values. For instance there are two same room_id
in dictionaries,
[{'booking_id': 1, 'room_id': 1, 'room_name': 'Standard Room', 'guest_type': 'adult', 'guest_name': 'John Doe'}, {'booking_id': 1, 'room_id': 1, 'room_name': 'Standard Room', 'guest_type': 'adult', 'guest_name': 'John Noakes'}]
But, as I said before, I want to create hierarchical json for each booking_id
. Here is the expected output,
output_values = [ [1, { 'rooms': [ { 'room_id': 1, 'room_name': 'Standard Room', 'guests': [ { 'guest_type': 'adult', 'guest_name': 'John Doe' }, { 'guest_type': 'adult', 'guest_name': 'John Noakes' } ] }, { 'room_id': 2, 'room_name': 'Standard Room', 'guests': [ { 'guest_type': 'adult', 'guest_name': 'Richard Miles' } ] } ] } ], [2, { 'rooms': [ { 'room_id': 1, 'room_name': 'Standard Room', 'guests': [ { 'guest_type': 'adult', 'guest_name': 'John Smith' }, { 'guest_type': 'adult', 'guest_name': 'Joe Public' } ] } ] } ] ] print(pd.DataFrame.from_records(output_values, columns=['booking_id', 'rooms'])) booking_id rooms 0 1 {'rooms': [{'room_id': 1, 'room_name': 'Standa... 1 2 {'rooms': [{'room_id': 1, 'room_name': 'Standa...
Any idea? Thanks in advance.
Answer
A lot of reshaping are required including multiple groupby
, apply
and zip
:
s = (df.groupby(["booking_id", "room_id", "room_name"]) .apply(lambda d: pd.Series({"guests": [{k: v for k,v in zip(["guest_type", "guest_name"], i)} for i in d[["guest_type", "guest_name"]].values]})) .reset_index(["room_id", "room_name"]) ) res = [[g, {'rooms': i.to_dict("records")}] for g, i in s.groupby(level=0)] import pprint pprint.pprint(res) [[1, {'rooms': [{'guests': [{'guest_name': 'John Doe', 'guest_type': 'adult'}, {'guest_name': 'John Noakes', 'guest_type': 'adult'}], 'room_id': 1, 'room_name': 'Standard Room'}, {'guests': [{'guest_name': 'Richard Miles', 'guest_type': 'adult'}], 'room_id': 2, 'room_name': 'Standard Room'}]}], [2, {'rooms': [{'guests': [{'guest_name': 'John Smith', 'guest_type': 'adult'}, {'guest_name': 'Joe Public', 'guest_type': 'adult'}], 'room_id': 1, 'room_name': 'Standard Room'}]}]]