Combine multiple identical nested dictionaries of a list by merging the value

Question:

I want to combine multiple identical nested dictionaries of a list by merging the value and store them in a list.

Suppose I have a dictionary like this:

ex = {'tran': { 'precision': 0.6666666666666666,
                'recall': 0.6486486486486487,
                'f1_score': 0.6575342465753425},
       'act': {
           'coy': {'precision': 0.7142857142857143,
                   'recall': 0.7142857142857143,
                   'f1_score': 0.7142857142857143},
           'fam': {'precision': 0.8518518518518519,
                   'recall': 0.9583333333333334,
                   'f1_score': 0.9019607843137256},
           'fri': {'precision': 0.7142857142857143,
                   'recall': 0.625,
                   'f1_score': 0.6666666666666666}},
           'pla': {'acc': {'precision': 0.42105263157894735,
                           'recall': 0.4444444444444444,
                           'f1_score': 0.43243243243243246},
                   'pen': {'precision': 0.42105263157894735,
                           'recall': 0.8888888888888888,
                           'f1_score': 0.5714285714285714},
                   'loc': {'precision': 0.2608695652173913,
                           'recall': 0.8571428571428571,
                           'f1_score': 0.4}},
            'j': {'precision': 0.44,
                  'recall': 0.4074074074074074,
                  'f1_score': 0.4230769230769231},
            'rea': {'precision': 0.5,
                    'recall': 0.5555555555555556,
                    'f1_score': 0.5263157894736842}}

I have a list that contain that dictionary multiple times (I suppose I have only two but it can be three, four, …

dicts = [ex, ex]

What I have tried:

merge_dict = {}
for k in dicts[0]:
    merge_dict[k] = [d[k] for d in dicts]

But I got this:

{'tran': [{'precision': 0.6666666666666666,
   'recall': 0.6486486486486487,
   'f1_score': 0.6575342465753425},
  {'precision': 0.6666666666666666,
   'recall': 0.6486486486486487,
   'f1_score': 0.6575342465753425}],
 'act': [{'coy': {'precision': 0.7142857142857143,
    'recall': 0.7142857142857143,
    'f1_score': 0.7142857142857143},
   'fam': {'precision': 0.8518518518518519,
    'recall': 0.9583333333333334,
    'f1_score': 0.9019607843137256},
   'fri': {'precision': 0.7142857142857143,
    'recall': 0.625,
    'f1_score': 0.6666666666666666}},
  {'coy': {'precision': 0.7142857142857143,
    'recall': 0.7142857142857143,
    'f1_score': 0.7142857142857143},
   'fam': {'precision': 0.8518518518518519,
    'recall': 0.9583333333333334,
    'f1_score': 0.9019607843137256},
   'fri': {'precision': 0.7142857142857143,
    'recall': 0.625,
    'f1_score': 0.6666666666666666}}],
 'pla': [{'acc': {'precision': 0.42105263157894735,
    'recall': 0.4444444444444444,
    'f1_score': 0.43243243243243246},
   'pen': {'precision': 0.42105263157894735,
    'recall': 0.8888888888888888,
    'f1_score': 0.5714285714285714},
   'loc': {'precision': 0.2608695652173913,
    'recall': 0.8571428571428571,
    'f1_score': 0.4}},
  {'acc': {'precision': 0.42105263157894735,
    'recall': 0.4444444444444444,
    'f1_score': 0.43243243243243246},
   'pen': {'precision': 0.42105263157894735,
    'recall': 0.8888888888888888,
    'f1_score': 0.5714285714285714},
   'loc': {'precision': 0.2608695652173913,
    'recall': 0.8571428571428571,
    'f1_score': 0.4}}],
 'j': [{'precision': 0.44,
   'recall': 0.4074074074074074,
   'f1_score': 0.4230769230769231},
  {'precision': 0.44,
   'recall': 0.4074074074074074,
   'f1_score': 0.4230769230769231}],
 'rea': [{'precision': 0.5,
   'recall': 0.5555555555555556,
   'f1_score': 0.5263157894736842},
  {'precision': 0.5,
   'recall': 0.5555555555555556,
   'f1_score': 0.5263157894736842}]}

It was not correct it seems like i need to dig deeper into the value in order to store each one of them in a list.

My desired output should look like this:

{'tran': { 'precision': [0.6666666666666666, 0.6666666666666666],
                        'recall': [0.6486486486486487, 0.6486486486486487],
                        'f1_score': [0.6575342465753425, 0.6575342465753425]},
              'act': {
                         'coy': {'precision': [0.7142857142857143, 0.7142857142857143],
                                 'recall': [0.7142857142857143, 0.7142857142857143],
                                 'f1_score': [0.7142857142857143, 0.7142857142857143]},
                         'fam': {'precision': [0.8518518518518519, 0.8518518518518519],
                                 'recall': [0.9583333333333334, 0.9583333333333334],
                                 'f1_score': [0.9019607843137256, 0.9019607843137256]},
                         'fri': {'precision': [0.7142857142857143, 0.7142857142857143],
                                 'recall': [0.625, 0.625],
                                 'f1_score': [0.6666666666666666, 0.6666666666666666]}},
              
              'pla': {
                         'acc': {'precision': [0.42105263157894735, 0.42105263157894735],
                                 'recall': [0.4444444444444444, 0.4444444444444444],
                                 'f1_score': [0.43243243243243246, 0.43243243243243246]},
                          'pen': {'precision': [0.42105263157894735, 0.42105263157894735],
                                  'recall': [0.8888888888888888, 0.8888888888888888],
                                  'f1_score': [0.5714285714285714, 0.5714285714285714]},
                          'loc': {'precision': [0.2608695652173913, 0.2608695652173913],
                                  'recall': [0.8571428571428571, 0.8571428571428571],
                                  'f1_score': [0.4, 0.4]}},
               'j': {'precision': [0.44, 0.44],
                     'recall': [0.4074074074074074, 0.4074074074074074],
                     'f1_score': [0.4230769230769231, 0.4230769230769231]},
               'rea': {'precision': [0.5, 0.5],
                       'recall': [0.5555555555555556, 0.5555555555555556],
                       'f1_score': [0.5263157894736842, 0.5263157894736842]}}

How can I get this desired output?

In addition to this, I also want a mean value for each list of each key.

For example of a key and value pair:
‘precision’: [0.6666666666666666, 0.6666666666666666] -> ‘precision’: 0.6666666666666666

where 0.6666666666666666 is the mean of [0.6666666666666666, 0.6666666666666666]

Asked By: Erwin

||

Answers:

You can use this function to recursively merge your dictionaries and take the mean of numeric values:

from statistics import mean

def rec_merge(dicts):
    result = dict()
    for k in dicts[0]:
        data = [d[k] for d in dicts]
        if isinstance(data[0], dict):
            result[k] = rec_merge(data)
        else:
            result[k] = mean(data)
    return result

For your sample input of multiple copies of the same dict, this returns the original dict.

Answered By: Nick

This is basically a Tree Traversal problem, the easiest way to explore a tree is using a recursive algorithm, for example:

from statistics import mean

def merge(*args):
    if isinstance(args[0], dict):
        return {
            key: merge(*[dct[key] for dct in args])
            for key in args[0]
        }
    return mean(args)

The merge function can take in either numeric values or dictionaries that have the same structure. If dictionaries are passed, the function creates a new dictionary by recursively merging the values for each key. If numeric values are passed, the function calculates the mean of the values, for example:

a = {'a': 1, 'b': {'x': 2, 'y': 3, 'z': {'i': 4, 'j': 5}}, 'c': 6}
b = {'a': 2, 'b': {'x': 3, 'y': 4, 'z': {'i': 5, 'j': 6}}, 'c': 7}
c = {'a': 3, 'b': {'x': 4, 'y': 5, 'z': {'i': 6, 'j': 7}}, 'c': 8}
d = {'a': 4, 'b': {'x': 5, 'y': 6, 'z': {'i': 7, 'j': 8}}, 'c': 9}

merge(a, b, c, d)

It will return this dictionary:

{'a': 2.5, 'b': {'x': 3.5, 'y': 4.5, 'z': {'i': 5.5, 'j': 6.5}}, 'c': 7.5}
Answered By: Jonathan Quispe
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.