Python – How can I move a nested json dictionary up to its own index?

Question:

I have a json dataset where each item/index can contain 2 nested dictionaries. The problem is that one of these nested dictionaries contains all of the exact key:value pairs as its parent dictionary. To put it in other words, I have a parent "Account" and any time there are "Sub-Accounts" it places the Sub-Accounts in the nested dictionary, and they are never seen as their own standalone item/index.

Here is the sample json of one item/index. Essentially, I need the sub_accounts object to extracted and become its own index. As you can see, it contains all of the same key:value objects as the parent containins the sub_accounts.

    {
        "classification": [
            {
                "classificationId": "Cash",
                "taxonomyId": "accounting.gp"
            }
        ],
        "id": "235",
        "kind": "Real",
        "name": "Checking",
        "sub_accounts": [
            {
                "classification": [
                    {
                        "classificationId": "Cash",
                        "taxonomyId": "accounting.gp"
                    }
                ],
                "id": "236",
                "kind": "Real",
                "name": "Cash Reserve",
                "sub_accounts": []
            }
        ]
    },

I have been able to use json_normalize or even variations of .pop() to accomplish a flattening of data and I have tried to explore other flattening options, but with no luck on the specific task I am trying to accomplish. Those solutions usually just result with the subaccounts still be associated to the original index.

Asked By: Srichard90

||

Answers:

You could use a recursive function to traverse the hierarchy while progressively popping out the "sub_accounts" keys:

def extractAccounts(accounts):
    return [s for a in accounts 
              for s in (a,*extractAccounts(a.pop("sub_accounts",[])))]

From a list of account objects:

data =  [{
        "classification": [
            {
                "classificationId": "Cash",
                "taxonomyId": "accounting.gp"
            }
        ],
        "id": "235",
        "kind": "Real",
        "name": "Checking",
        "sub_accounts": [
            {
                "classification": [
                    {
                        "classificationId": "Cash",
                        "taxonomyId": "accounting.gp"
                    }
                ],
                "id": "236",
                "kind": "Real",
                "name": "Cash Reserve",
                "sub_accounts": []
            }
        ]
    }]

Output:

accounts = extractAccounts(data)
for i,account in enumerate(accounts):
    print("Account #",i)
    print(account)


Account # 0
{'classification': [{'classificationId': 'Cash', 'taxonomyId': 'accounting.gp'}], 'id': '235', 'kind': 'Real', 'name': 'Checking'}
Account # 1
{'classification': [{'classificationId': 'Cash', 'taxonomyId': 'accounting.gp'}], 'id': '236', 'kind': 'Real', 'name': 'Cash Reserve'}

If your top level is a single account (i.e. not a list), just place it in a list when calling the function: extractAccount([data])

Answered By: Alain T.

I don’t have a generic answer, but this seems to do what you need:

raw_data = """                                                                                                                                                                                 
[                                                                                                                                                                                              
    {                                                                                                                                                                                          
        "classification": [                                                                                                                                                                    
            {                                                                                                                                                                                  
                "classificationId": "Cash",                                                                                                                                                    
                "taxonomyId": "accounting.gp"                                                                                                                                                  
            }                                                                                                                                                                                  
        ],                                                                                                                                                                                     
        "id": "235",                                                                                                                                                                           
        "kind": "Real",                                                                                                                                                                        
        "name": "Checking",                                                                                                                                                                    
        "sub_accounts": [                                                                                                                                                                      
            {                                                                                                                                                                                  
                "classification": [                                                                                                                                                            
                    {                                                                                                                                                                          
                        "classificationId": "Cash",                                                                                                                                            
                        "taxonomyId": "accounting.gp"                                                                                                                                          
                    }                                                                                                                                                                          
                ],                                                                                                                                                                             
                "id": "236",                                                                                                                                                                   
                "kind": "Real",                                                                                                                                                                
                "name": "Cash Reserve",                                                                                                                                                        
                "sub_accounts": []                                                                                                                                                             
            }                                                                                                                                                                                  
        ]                                                                                                                                                                                      
    }                                                                                                                                                                                          
]                                                                                                                                                                                              
"""                                                                                                                                                                                            
                                                                                                                                                                                               
import json                                                                                                                                                                                    
jdict = json.loads(raw_data)                                                                                                                                                                   
                                                                                                                                                                                               
empty_list = list()                                                                                                                                                                            
result     = list()                                                                                                                                                                            
for elem in jdict:                                                                                                                                                                             
    sub_elem_list = elem['sub_accounts']                                                                                                                                                       
    elem['sub_accounts'] = empty_list                                                                                                                                                          
    result.append(elem)                                                                                                                                                                        
    for sub_elem in sub_elem_list:                                                                                                                                                             
        result.append(sub_elem)                                                                                                                                                                
                                                                                                                                                                                               
print(json.dumps(result, indent=4)) 

output = """
[
    {
        "classification": [
            {
                "classificationId": "Cash",
                "taxonomyId": "accounting.gp"
            }
        ],
        "id": "235",
        "kind": "Real",
        "name": "Checking",
        "sub_accounts": []
    },
    {
        "classification": [
            {
                "classificationId": "Cash",
                "taxonomyId": "accounting.gp"
            }
        ],
        "id": "236",
        "kind": "Real",
        "name": "Cash Reserve",
        "sub_accounts": []
    }
]

"""

When you have nested structures you need to nest your loops. The other answer has recursion, which can cause problems if you’re nesting over a thousand recursive calls (so probably not this case). I also assumed that you care about order, preferring the parent’s id to be first. Also, if you’re trying to get rid of the sub_accounts from the json, then you’d want to pop it from the records, but I again assume that the structure should be maintained.

Answered By: VoNWooDSoN