Reshaping a large dictionary

Question:

I am working on xbrl document parsing. I got to a point where I have a large dic structured like this….

sample of a dictionary I’m working on

Since it’s bit challenging to describe the pattern of what I’m trying to achieve I just put an example of what I’d like it to be…

sample of what I’m trying to achieve

Since I’m fairly new to programing, I’m hustling for days with this. Trying different approaches with loops, list and dic comprehension starting from here…


for k in storage_gaap:
    if 'context_ref' in storage_gaap[k]:
        for _k in storage_gaap[k]['context_ref']:
            storage_gaap[k]['context_ref']={_k}```

storage_gaap being the master dictionary. Sorry for attaching pictures, but it’s just much clearer to see the dictionary

I’d really appreciate any and ever help

Asked By: Boyan K

||

Answers:

Here’s a solution using zip and dictionary comprehension to do what you’re trying to do using toy data in a similar structure.

import itertools
import pprint

# Sample data similar to provided screenshots
data = {
    'a': {
        'id': 'a',
        'vals': ['a1', 'a2', 'a3'],
        'val_num': [1, 2, 3]
    },
    'b': {
        'id': 'b',
        'vals': ['b1', 'b2', 'b3'],
        'val_num': [4, 5, 6]
    }
}

# Takes a tuple of keys, and a list of tuples of values, and transforms them into a list of dicts
# i.e ('id', 'val'), [('a', 1), ('b', 2) => [{'id': 'a', 'val': 1}, {'id': 'b', 'val': 2}]
def get_list_of_dict(keys, list_of_tuples):
     list_of_dict = [dict(zip(keys, values)) for values in list_of_tuples]
     return list_of_dict

def process_dict(key, values):
    # Transform the dict with lists of values into a list of dicts
    list_of_dicts = get_list_of_dict(('id', 'val', 'val_num'), zip(itertools.repeat(key, len(values['vals'])), values['vals'], values['val_num']))
    # Dictionary comprehension to group them based on the 'val' property of each dict
    return {d['val']: {k:v for k,v in d.items() if k != 'val'} for d in list_of_dicts}

# Reorganize to put dict under a 'context_values' key
processed = {k: {'context_values': process_dict(k, v)} for k,v in data.items()}

# {'a': {'context_values': {'a1': {'id': 'a', 'val_num': 1},
#                           'a2': {'id': 'a', 'val_num': 2},
#                           'a3': {'id': 'a', 'val_num': 3}}},
#  'b': {'context_values': {'b1': {'id': 'b', 'val_num': 4},
#                           'b2': {'id': 'b', 'val_num': 5},
#                           'b3': {'id': 'b', 'val_num': 6}}}}
pprint.pprint(processed)
Answered By: rcbevans

Ok, Here is the updated solution from my case. Catch for me was the was the zip function since it only iterates over the smallest list passed. Solution was the itertools.cycle method Here is the code:

data =  {'us-gaap_WeightedAverageNumberOfDilutedSharesOutstanding': {'context_ref': ['D20210801-20220731',
                                                                          'D20200801-20210731',
                                                                          'D20190801-20200731',
                                                                          'D20210801-20220731',
                                                                          'D20200801-20210731',
                                                                          'D20190801-20200731'],
                                                          'decimals': ['-5',
                                                                       '-5',
                                                                       '-5',
                                                                       '-5',
                                                                       '-5',
                                                                       '-5'],
                                                          'id': ['us-gaap:WeightedAverageNumberOfDilutedSharesOutstanding'],
                                                          'master_id': ['us-gaap_WeightedAverageNumberOfDilutedSharesOutstanding'],
                                                          'unit_ref': ['shares',
                                                                       'shares',
                                                                       'shares',
                                                                       'shares',
                                                                       'shares',
                                                                       'shares'],
                                                          'value': ['98500000',
                                                                    '96400000',
                                                                    '96900000',
                                                                    '98500000',
                                                                    '96400000',
                                                                    '96900000']},


def get_list_of_dict(keys, list_of_tuples):
list_of_dict = [dict(zip(keys, values)) for values in list_of_tuples]
return list_of_dict

def process_dict(k, values):
list_of_dicts = get_list_of_dict(('context_ref', 'decimals', 'id','master_id','unit_ref','value'),
                zip((values['context_ref']),values['decimals'],itertools.cycle(values['id']),
                itertools.cycle(values['master_id']),values['unit_ref'], values['value']))
return {d['context_ref']: {k:v for k,v in d.items()if k != 'context_ref'} for d in list_of_dicts}

processed = {k: {'context_values': process_dict(k, v)} for k,v in data.items()}

pprint.pprint(processed)
Answered By: Boyan K
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.