Printout specific keys and values in nested .json data recursively in python

So I have been going back and forth while trying to get specific values and keys out of a complex nested .json file.

I found out that a good approach is to use a recursive function.

I understand it on basic levels ect. but I cant manage to create one for my original file.

import json

#open json file in folder
with open('demk-bkp-001.json') as file:
data = json.load(file)

Here is the FULL .json data for reference: https://pastebin.com/hmjv81nS

My goal is to get every value in “name”, “size” and “mountpoint”.

Tips and examples dont seem to help me on multiple platforms hence I cant find a similar nested json file. please help 🙂

Answer

Iterative approach: storing items to be processed in a data structure

A recursive function is one possibility; another possibility is to keep a data structure holding all the dictionaries you’ve encountered so far but haven’t yet processed; then, as long as you still have at least one unprocessed dictionary, process it; if it has children, add them to the structure. Here I use a simple python list:

names, sizes, mountpoints = [],[],[]
to_be_processed = data['blockdevices']
while to_be_processed:
  d = to_be_processed.pop()
  names.append(d['name'])
  sizes.append(d['size'])
  mountpoints.append(d['mountpoint'])
  if 'children' in d:
    to_be_processed.extend(d['children'])

Preserving order: using a FIFO instead of a LIFO data structure

Note that the iterative code provided above uses a python list with its methods .extend() and .pop(). Effectively, this uses the python list as a LIFO, a Last-In-First-Out data structure. If you want to preserve the order of your data, you want to use a FIFO, a First-In-First-Out data structure, instead. You could replace .pop() with .pop(0) to remove the first element instead of the last, but note that list.pop(0) is not an efficient operation in python; it requires copying all elements from the list. Instead, we can use a collections.deque object, with its .extend() and .popleft() methods:

import collections

names, sizes, mountpoints = [],[],[]
to_be_processed = collections.deque(data['blockdevices'])
while to_be_processed:
  d = to_be_processed.popleft()
  names.append(d['name'])
  sizes.append(d['size'])
  mountpoints.append(d['mountpoint'])
  if 'children' in d:
    to_be_processed.extend(d['children'])

Recursive approach

A recursive approach is possible too. Call it on data['blockdevices'], and have it make a recursive call on the children if there are any.

def process_data(d_list, names, sizes, mountpoints):
  for d in d_list:
    names.append(d['name'])
    sizes.append(d['size'])
    mountpoints.append(d['mountpoint'])
    if 'children' in d:
      process_data(d['children'], names, sizes, mountpoints)
  return names, sizes, mountpoints

process_data(data['blockdevices'], [], [], [])