Merge two list of dictionaries based on a condition

Question:

I have two lists of dictionaries, and I need to merge them when ever USA and GOOG are the same.

list1 = 
[{'USA': 'Eastern', 
  'GOOG': '2019', 
  'Up': {'Upfront': 45}, 
  'Right': {'Upfront': 12}}, 

 {'USA': 'Western', 
  'GOOG': '2019', 
  'Up': {'Upfront': 10}, 
  'Right': {'Upfront': 15}}]

list2=
[{'USA': 'Western', 
  'GOOG': '2019', 
  'Down': {'Downback': 35}, 
  'Right': {'Downback': 25}}, 

 {'USA': 'Eastern', 
  'GOOG': '2018', 
  'Down': {'Downback': 15}, 
  'Right': {'Downback': 55}}]

Since USA and GOOG had same values for 2nd element in list1 and 1st element in list2, so they should be merged. The result expected is as follows –

Result = 
[{'USA': 'Eastern', 
  'GOOG': '2019', 
  'Up': {'Upfront': 45}, 
  'Right': {'Upfront': 12}}, 

 {'USA': 'Western', 
  'GOOG': '2019', 
  'Up': {'Upfront': 10}, 
  'Down': {'Downback': 35}, 
  'Right': {'Upfront': 15, 'Downback': 25}},

 {'USA': 'Eastern', 
  'GOOG': '2018', 
  'Down': {'Downback': 15}, 
  'Right': {'Downback': 55}}]

How can we write a generic code for this. I tried using defaultdict, but did not know how to concatenate an arbitrary number of rest of dictionaries.

My attempt:

from collections import defaultdict
dics = list1+list2

for dic in dics:
    for key, val in dic.items():
        dd[key].append(val)            

for dic in dics:
    for key, val in dic.items(): 
        dd[key].append(val)
Asked By: cph_sto

||

Answers:

Here is my attempt. Not sure if this is the best way, but it’s a start.

Steps:

  • combine lists of dictionaries
  • create a sorted collection of the relevant values and index in combined list
  • group by the relevant values
  • iterate over the keys and groups adding the dictionary if it only appears once based on value matches or update a dictionary if is appears more than once based on value matches

Code:

import operator as op
import itertools as it
from functools import reduce
from pprint import pprint

dictionaries = reduce(op.add, (list1, list2,))
groups = it.groupby(sorted([(op.itemgetter('USA', 'GOOG')(d), i)
                            for i, d in enumerate(dictionaries)]),
                    key=op.itemgetter(0))
results = []
for key, group in groups:
    _, indices = zip(*group)
    if len(indices) == 1:
        i, = indices
        results.append(dictionaries[i])
    else:
        merge = dictionaries[indices[0]]
        for i in indices[1:]:
            merge.update(dictionaries[i])
        results.append(merge)
pprint(results, indent=4)

OUTPUT:

[ { ‘Down’: {‘Downback’: 15},
‘GOOG’: ‘2018’,
‘Right’: {‘Downback’: 55},
‘USA’: ‘Eastern’},
{ ‘GOOG’: ‘2019’,
‘Right’: {‘Upfront’: 12},
‘USA’: ‘Eastern’,
‘Up’: {‘Upfront’: 45}},
{ ‘Down’: {‘Downback’: 35},
‘GOOG’: ‘2019’,
‘Right’: {‘Downback’: 25},
‘USA’: ‘Western’,
‘Up’: {‘Upfront’: 10}}]

Answered By: dmmfll

There are two algorithmic tasks in what you need: find the records that have the same values for USA and GOOGL, and then joining then and do that in a way that if the same key exists in both records, their value is merged.

The naive approach for the first would be to have a for loop that would iterate the values of list1, and for each value, iterate all values for list2 – two separated loops won’t cut it, you’d need two nested for loops:

for element in list1:
    for other_element in list2:
        if ...:
            ...

While this approach would work, and is fine for small lists (<1000 records, for example), it takes an amount of time and resources that are proportional to the square of your list sizes – that is, for lists that are close to ~1000 items we are talking 1 million iterations. If the lists are thenselves 1.000.000 items, the computation would take 1 * 10^12 comparisons, and that is not feasible in today’s computers at all.

So, a nice solution is to re-create one of the lists in a way that the comparison key is used as a hash -that is done by copying the list to a dictionary where the keys are the values you want to compare, and then iterate on the second list just once. As dictionaries have a constant time to find items, that will make the number of comparisons be proportional to your list sizes.

The second part of your task is to compare to copy one record to a result list, and update the keys on the resulting copy so that any duplciate keys are merged. To avoid a problem when copying the first records, we are safer using Python’s copy.deepcopy, which will ensure the sub-dictionaries are different objects than the ones in the original record, and will stay isolated.

from copy import deepcopy

def merge_lists(list1, list2):
    # create dictionary from list1:
    dict1 = {(record["GOOG"], record["USA"]): record  for record in list1}

    #compare elements in list2 to those on list1:

    result = {}
    for record in list2:
        ckey = record["GOOG"], record["USA"]
        new_record = deepcopy(record)
        if ckey in dict1:
            for key, value in dict1[ckey].items():
                if key in ("GOOG", "USA"):
                    # Do not merge these keys
                    continue
                # Dict's "setdefault" finds a key/value, and if it is missing
                # creates a new one with the second parameter as value
                new_record.setdefault(key, {}).update(value)

        result[ckey] = new_record

    # Add values from list1 that were not matched in list2:
    for key, value in dict1.items():
        if key not in result:
            result[key] = deepcopy(value)

    return list(result.values())
Answered By: jsbueno

Here is my attempt at a solution. It manages to reproduce the results you requested.
Please ignore how badly named my variables are. I found this problem quite interesting.

def joinListByDictionary(list1, list2):
    """Join lists on USA and GOOG having the same value"""
    list1.extend(list2)
    matchIndx = []
    matches = []    

    for dicts in range(len(list1)):
        for dicts2 in range(len(list1)):
            if dicts == dicts2:
                continue
            if list1[dicts]["GOOG"] == list1[dicts2]["GOOG"] and list1[dicts]["USA"] == list1[dicts2]["USA"]:

                matches.append(list1[dicts])
                matchIndx.append(dicts) 
    for dictz in matches:
        for dictzz in matches:
            for key in dictz.keys():
                if key in dictzz.keys() and isinstance(dictzz[key], dict):
                    dictzz[key].update(dictz[key])          
        matches.remove(dictz)

    newList = [list1[ele] for ele in range(len(list1)) if ele not in matchIndx]
    newList.extend(matches)
    print newList
    return newList       

joinListByDictionary(list1, list2)
Answered By: prismo
list1 = [{'USA': 'Eastern', 
  'GOOG': '2019', 
  'Up': {'Upfront': 45}, 
  'Right': {'Upfront': 12}}, 

 {'USA': 'Western', 
  'GOOG': '2019', 
  'Up': {'Upfront': 10}, 
  'Right': {'Upfront': 15}}]

list2=[{'USA': 'Western', 
  'GOOG': '2019', 
  'Down': {'Downback': 35}, 
  'Right': {'Downback': 25}}, 

 {'USA': 'Eastern', 
  'GOOG': '2018', 
  'Down': {'Downback': 15}, 
  'Right': {'Downback': 55}}]



def mergeDicts(d1,d2):
    for k,v in d2.items():
        if k in d1:
            if isinstance(v,dict):
                mergeDicts(d1[k], v)
                
            else: d1[k]=v 
        else: d1[k]=v
        
def merge_lists(list1, list2):
    merged_list = []
    for d1 in list1:
        for d2 in list2:
            if d1['USA'] == d2['USA'] and d1['GOOG'] == d2['GOOG']:
                mergeDicts(d1, d2)
                merged_list.append(d1)
                break
        else:
            merged_list.append(d1)
    for d2 in list2:
        for d1 in list1:
            if d1['USA'] == d2['USA'] and d1['GOOG'] == d2['GOOG']:
                break
        else:
            merged_list.append(d2)
    return merged_list

res1 = merge_lists(list1, list2)
print(res1)
               
"""
[{'USA': 'Eastern', 'GOOG': '2019', 'Up': {'Upfront': 45}, 'Right': {'Upfront': 12}}, 
{'USA': 'Western', 'GOOG': '2019', 'Up': {'Upfront': 10}, 
'Right': {'Upfront': 15, 'Downback': 25},
 'Down': {'Downback': 35}}, 
 {'USA': 'Eastern', 'GOOG': '2018', 'Down': {'Downback': 15}, 'Right': {'Downback': 55}}]
"""                
                
Answered By: Soudipta Dutta
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.