Python Pandas json_normalize with multiple lists of dicts

Question:

I’m trying to flatten a JSON file that was originally converted from XML using xmltodict(). There are multiple fields that may have a list of dictionaries. I’ve tried using record_path with meta data to no avail, but I have not been able to get it to work when there are multiple fields that may have other nested fields. It’s expected that some fields will be empty for any given record

I have tried searching for another topic and couldn’t find my specific problem with multiple nested fields. Can anyone point me in the right direction?

Thanks for any help that can be provided!

Sample base Python (without the record path)

import pandas as pd
import json

with open('./example.json', encoding="UTF-8") as json_file:
    json_dict = json.load(json_file)

df = pd.json_normalize(json_dict['WIDGET'])
print(df)

df.to_csv('./test.csv', index=False)

Sample JSON

{
    "WIDGET": [
        {
            "ID": "6",
            "PROBLEM": "Electrical",
            "SEVERITY_LEVEL": "1",
            "TITLE": "Battery's Missing",
            "CATEGORY": "User Error",
            "LAST_SERVICE": "2020-01-04T17:39:37Z",
            "NOTICE_DATE": "2022-01-01T08:00:00Z",
            "FIXABLE": "1",
            "COMPONENTS": {
                "WHATNOTS": {
                    "WHATNOT1": "Battery Compartment",
                    "WHATNOT2": "Whirlygig"
                }
            },
            "DIAGNOSIS": "Customer needs to put batteries in the battery compartment",
            "STATUS": "0",
            "CONTACT_TYPE": {
                "CALL": "1"
            }
        },
        {
            "ID": "1004",
            "PROBLEM": "Electrical",
            "SEVERITY_LEVEL": "4",
            "TITLE": "Flames emit from unit",
            "CATEGORY": "Dangerous",
            "LAST_SERVICE": "2015-06-04T21:40:12Z",
            "NOTICE_DATE": "2022-01-01T08:00:00Z",
            "FIXABLE": "0",
            "DIAGNOSIS": "A demon seems to have possessed the unit and his expelling flames from it",
            "CONSEQUENCE": "Could burn things",
            "SOLUTION": "Call an exorcist",
            "KNOWN_PROBLEMS": {
                "PROBLEM": [
                    {
                        "TYPE": "RECALL",
                        "NAME": "Bad Servo",
                        "DESCRIPTION": "Bad servo's shipped in initial product"
                    },
                    {
                        "TYPE": "FAILURE",
                        "NAME": "Operating outside normal conditions",
                        "DESCRIPTION": "Device failed when customer threw into wood chipper"
                    }
                ]
            },
            "STATUS": "1",
            "REPAIR_BULLETINS": {
                "BULLETIN": [
                    {
                        "@id": "4",
                        "#text": "Known target of the occult"
                    },
                    {
                        "@id": "5",
                        "#text": "Not meant to be thrown into wood chippers"
                    }
                ]
            },
            "CONTACT_TYPE": {
                "CALL": "1"
            }
        }        
    ]
}

Sample CSV

ID PROBLEM SEVERITY_LEVEL TITLE CATEGORY LAST_SERVICE NOTICE_DATE FIXABLE DIAGNOSIS STATUS COMPONENTS.WHATNOTS.WHATNOT1 COMPONENTS.WHATNOTS.WHATNOT2 CONTACT_TYPE.CALL CONSEQUENCE SOLUTION KNOWN_PROBLEMS.PROBLEM REPAIR_BULLETINS.BULLETIN
6 Electrical 1 Battery’s Missing User Error 2020-01-04T17:39:37Z 2022-01-01T08:00:00Z 1 Customer needs to put batteries in the battery compartment 0 Battery Compartment Whirlygig 1
1004 Electrical 4 Flames emit from unit Dangerous 2015-06-04T21:40:12Z 2022-01-01T08:00:00Z 0 A demon seems to have possessed the unit and his expelling flames from it 1 1 Could burn things Call an exorcist [{‘TYPE’: ‘RECALL’, ‘NAME’: ‘Bad Servo’, ‘DESCRIPTION’: "Bad servo’s shipped in initial product"}, {‘TYPE’: ‘FAILURE’, ‘NAME’: ‘Operating outside normal conditions’, ‘DESCRIPTION’: ‘Device failed when customer threw into wood chipper’}] [{‘@id’: ‘4’, ‘#text’: ‘Known target of the occult’}, {‘@id’: ‘5’, ‘#text’: ‘Not meant to be thrown into wood chippers’}]
Asked By: snyms

||

Answers:

I have attempted to extract the data and turned it into nested dictionary (instead of nested with list), so that pd.json_normalize() can work

for row in range(len(json_dict['WIDGET'])):
    try:
        lis = json_dict['WIDGET'][row]['KNOWN_PROBLEMS']['PROBLEM']
        del   json_dict['WIDGET'][row]['KNOWN_PROBLEMS']['PROBLEM']
        for i, item in enumerate(lis):
            json_dict['WIDGET'][row]['KNOWN_PROBLEMS'][str(i)] = item
            
        lis = json_dict['WIDGET'][row]['REPAIR_BULLETINS']['BULLETIN']
        del   json_dict['WIDGET'][row]['REPAIR_BULLETINS']['BULLETIN']
        for i, item in enumerate(lis):
            json_dict['WIDGET'][row]['REPAIR_BULLETINS'][str(i)] = item
    except KeyError:
        continue
df = pd.json_normalize(json_dict['WIDGET']).T
print(df)

If you have to manually add the varying keys from the larger dataset, here’s a way to extract them automatically by identifying them as type list (and provided they are nested by 2 levels only)

linkage = []
for item in json_dict['WIDGET']:
    for k1 in item.keys():    #get keys from first level
        if isinstance(item[k1], str):
            continue
        #print(item[k1])
        for k2 in item[k1].keys():    #get keys from second level
            if isinstance(item[k1][k2], str):
                continue
            #print(item[k1][k2])
            if isinstance(item[k1][k2], list):
                linkage.append((k1, k2))

print(linkage)
# [('KNOWN_PROBLEMS', 'PROBLEM'), ('REPAIR_BULLETINS', 'BULLETIN')]

for row in range(len(json_dict['WIDGET'])):
    for link in linkage:
        try:
            lis = json_dict['WIDGET'][row][link[0]][link[1]]
            del   json_dict['WIDGET'][row][link[0]][link[1]]    #delete original dict value (which is a list)
            for i, item in enumerate(lis):
                json_dict['WIDGET'][row][link[0]][str(i)] = item    #replace list with dict value (which is a dict)
        except KeyError:
            continue
df = pd.json_normalize(json_dict['WIDGET']).T
print(df)

Output:

                                                          0                              1
ID                                                        6                           1004
PROBLEM                                          Electrical                     Electrical
SEVERITY_LEVEL                                            1                              4
TITLE                                     Battery's Missing          Flames emit from unit
CATEGORY                                         User Error                      Dangerous
LAST_SERVICE                           2020-01-04T17:39:37Z           2015-06-04T21:40:12Z
NOTICE_DATE                            2022-01-01T08:00:00Z           2022-01-01T08:00:00Z
FIXABLE                                                   1                              0
DIAGNOSIS                     Customer needs to put batt...  A demon seems to have poss...
STATUS                                                    0                              1
COMPONENTS.WHATNOTS.WHATNOT1            Battery Compartment                            NaN
COMPONENTS.WHATNOTS.WHATNOT2                      Whirlygig                            NaN
CONTACT_TYPE.CALL                                         1                              1
CONSEQUENCE                                             NaN              Could burn things
SOLUTION                                                NaN               Call an exorcist
KNOWN_PROBLEMS.0.TYPE                                   NaN                         RECALL
KNOWN_PROBLEMS.0.NAME                                   NaN                      Bad Servo
KNOWN_PROBLEMS.0.DESCRIPTION                            NaN  Bad servo's shipped in ini...
KNOWN_PROBLEMS.1.TYPE                                   NaN                        FAILURE
KNOWN_PROBLEMS.1.NAME                                   NaN  Operating outside normal c...
KNOWN_PROBLEMS.1.DESCRIPTION                            NaN  Device failed when custome...
REPAIR_BULLETINS.0.@id                                  NaN                              4
REPAIR_BULLETINS.0.#text                                NaN     Known target of the occult
REPAIR_BULLETINS.1.@id                                  NaN                              5
REPAIR_BULLETINS.1.#text                                NaN  Not meant to be thrown int...
Answered By: perpetualstudent