Python flatten multilevel/nested JSON

I am trying to convert JSON to CSV file, that I can use for further analysis. Issue with my structure is that I have quite some nested dict/lists when I convert my JSON file.

I tried to use pandas json_normalize(), but it only flattens first level.

import json
import pandas as pd
from pandas.io.json import json_normalize
from cs import CloudStack

api_key = xxxx
secret = xxxx
endpoint = xxxx

cs = CloudStack(endpoint=endpoint,
                key=api_key,
                secret=secret)

virtual_machines = cs.virtMach()

test = json_normalize(virtual_machines["virtualmachine"])

test.to_csv("test.csv", sep="|", index=False)

Any idea how to flatter whole JSON file, so I can create single line input to CSV file for single (in this case virtual machine) entry? I have tried couple of solutions posted here, but my result was always only first level was flattened.

This is sample JSON (in this case, I still get “securitygroup” and “nic” output as JSON format:

{
    "count": 13,
    "virtualmachine": [
        {
            "id": "1082e2ed-ff66-40b1-a41b-26061afd4a0b",
            "name": "test-2",
            "displayname": "test-2",
            "securitygroup": [
                {
                    "id": "9e649fbc-3e64-4395-9629-5e1215b34e58",
                    "name": "test",
                    "tags": []
                }
            ],
            "nic": [
                {
                    "id": "79568b14-b377-4d4f-b024-87dc22492b8e",
                    "networkid": "05c0e278-7ab4-4a6d-aa9c-3158620b6471"
                },
                {
                    "id": "3d7f2818-1f19-46e7-aa98-956526c5b1ad",
                    "networkid": "b4648cfd-0795-43fc-9e50-6ee9ddefc5bd"
                    "traffictype": "Guest"
                }
            ],
            "hypervisor": "KVM",
            "affinitygroup": [],
            "isdynamicallyscalable": false
        }
    ]
}

Thank you and best regards, Bostjan

Answer

Thanks to gyx-hh, this has been resolved:

I used following function (details can be found here):

def flatten_json(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

This unfortunately completely flattens whole JSON, meaning that if you have multi-level JSON (many nested dictionaries), it might flatten everything into single line with tons of columns.

What I used in the end was json_normalize() and specified structure that I required. Nice example of how to do it that way can be found here.

Hopefully this hepls someone and again thank to gyx-hh for solution.

Best regards

Leave a Reply

Your email address will not be published. Required fields are marked *