Convert nested mongo db documents into pandas dataframe

I have a mongoDB collection with documents like this one

doc = {
  "_id": {
    "$oid": "516622c9ce21150200000d87"
  },
  "SubmissionDate": {
    "$date": "2013-04-11T02:41:13.162Z"
  },
  "isComplete": True,

  "Rounds": [
    {
      "Photo": [
        
      ],
      "A": {
        "Complexity": 55,
        "Colour": 85,
        "Deep": 51,
        "Effervescence": 44
      },
      "B": {
        "QualityPIDs": [
          
        ],
        "QualityScales": [
          
        ],
        "Complexity": 43,
        "Qualities": [
          
        ]
      },
      "C": {
        "QualityPIDs": [
          
        ],
        "QualityScales": [
          
        ],
        "Complexity": 60,
        "UHS": 46,
        "Colour": 33,
        "Qualities": [
          
        ]
      },
      "D": {
        "Complexity": 73,
        "Duration": 68,
        "Quality": 65
      }
    }
  ],
  "Item": {
    "_id": {
      "$oid": "51e6d678c06918db21156f92"
    },
    "Country": "Australia",
    "Name": "King",
    "PeopleId": {
      "$oid": "51dddb69a9d9350200000"
    },
    "Style": "Apple",
    "Type": "Flat",
    "UserSubmitted": False
  }
}

I need to convert this collection into pandas dataframe.

Solution suggested here How to import data from mongodb to pandas? does the main job. But I still have Rounds column with a dict of dictionaries inside.

I did a set of loops in order to access subdictionaries of Rounds

df = pd.json_normalize(doc)

A_data = pd.DataFrame(columns=df.Rounds[0][0]['A'].keys())
for i in range(len(df.Rounds)):
    A_data = A_data.append(pd.json_normalize(df.Rounds[0][0]['A']), ignore_index=True)

And finally I concat A_data to my main data frame.

Is there a faster way to do it? Right now loop takes to much time. Thank you!

Answer

  • Each level of the dict can be specified using the mata parameter and use 'Rounds' for the record_path.
import pandas as pd

meta = [['_id', '$oid'],
        ['Item', 'Country'],
        ['Item', 'Name'],
        ['Item', 'Style'],
        ['Item', 'Type'],
        ['Item', 'UserSubmitted'],
        ['Item', '_id', '$oid'],
        ['Item', 'PeopleId', '$oid'],
        ['SubmissionDate', '$date'],
        'isComplete']

df = pd.json_normalize(doc, record_path='Rounds', meta=meta)

# display(df)
  Photo  A.Complexity  A.Colour  A.Deep  A.Effervescence B.QualityPIDs B.QualityScales  B.Complexity B.Qualities C.QualityPIDs C.QualityScales  C.Complexity  C.UHS  C.Colour C.Qualities  D.Complexity  D.Duration  D.Quality                  _id.$oid Item.Country Item.Name Item.Style Item.Type Item.UserSubmitted             Item._id.$oid     Item.PeopleId.$oid      SubmissionDate.$date isComplete
0    []            55        85      51               44            []              []            43          []            []              []            60     46        33          []            73          68         65  516622c9ce21150200000d87    Australia      King      Apple      Flat              False  51e6d678c06918db21156f92  51dddb69a9d9350200000  2013-04-11T02:41:13.162Z       True