join two lists of dictionaries on a single key

Question:

Given n lists with m dictionaries as their elements, I would like to produce a new list, with a joined set of dictionaries. Each dictionary is guaranteed to have a key called “index”, but could have an arbitrary set of keys beyond that. The non-index keys will never overlap across lists. For example, imagine the following two lists:

l1 = [{"index":1, "b":2}, {"index":2, "b":3}, {"index":3, "green":"eggs"}]
l2 = [{"index":1, "c":4}, {"index":2, "c":5}]

("b" would never appear in l2, since it appeared in l1, and similarly, "c" would never appear in l1, since it appeared in l2)

I would like to produce a joined list:

l3 = [{"index":1, "b":2, "c":4}, 
      {"index":2, "b":3, "c":5}, 
      {"index":3, "green":"eggs"}]

What is the most efficient way to do this in Python?

Asked By: Bacon

||

Answers:

from collections import defaultdict

l1 = [{"index":1, "b":2}, {"index":2, "b":3}, {"index":3, "green":"eggs"}]
l2 = [{"index":1, "c":4}, {"index":2, "c":5}]

d = defaultdict(dict)
for l in (l1, l2):
    for elem in l:
        d[elem['index']].update(elem)
l3 = d.values()

# l3 is now:

[{'b': 2, 'c': 4, 'index': 1},
 {'b': 3, 'c': 5, 'index': 2},
 {'green': 'eggs', 'index': 3}]

EDIT: Since l3 is not guaranteed to be sorted (.values() returns items in no specific order), you can do as @user560833 suggests:

from operator import itemgetter

...

l3 = sorted(d.values(), key=itemgetter("index"))
Answered By: eumiro

Here’s a one-liner that does this:

[dict(sum([z.items() for z in z2],[])) for z2 in [[x3 for x3 in l1+l2 if x3['index']==key] for key in set([x1['index'] for x1 in l1]+[x2['index'] for x2 in l2])]]

Not quite as elegant as a list-comprehension. I don’t think the result is guaranteed to necessarily be sorted the way you want either.

Expanding the one-liner:

[
    dict(sum([z.items() for z in z2],[])) 
    for z2 in [
        [
            x3 for x3 in l1+l2 if x3['index']==key
        ] for key in set(
            [x1['index'] for x1 in l1]+[x2['index'] for x2 in l2]
        )
    ]
]

The set expression on the 6th line gets all the unique index values from both lists. The list comprehension around that (lines 3-9) creates a list of lists where each inner list is a combined list of dictionaries for that index/key with a particular index value. The outermost list comprehension creates a single list of tuple-pairs for each key and converts it back to a list of dictionaries.

Answered By: Mark

In python 3.5 or higher, you can merge dictionaries in a single statement.

So for python 3.5 or higher, a quick solution would be:

from itertools import zip_longest

l3 = [{**u, **v} for u, v in zip_longest(l1, l2, fillvalue={})]

print(l3)
#[
#    {'index': 1, 'b': 2, 'c': 4}, 
#    {'index': 2, 'b': 3, 'c': 5}, 
#    {'index': 3, 'green': 'eggs'}
#]

However if the two lists were the same size, you could simply use zip:

l3 = [{**u, **v} for u, v in zip(l1, l2)]

Note: This assumes that the lists are sorted the same way by index, which is stated by OP to not be the case in general.

In order to generalize for that case, one way is to create a custom zip-longest type function which yields values from the two lists only if they match on a key.

For instance:

def sortedZipLongest(l1, l2, key, fillvalue={}):  
    l1 = iter(sorted(l1, key=lambda x: x[key]))
    l2 = iter(sorted(l2, key=lambda x: x[key]))
    u = next(l1, None)
    v = next(l2, None)

    while (u is not None) or (v is not None):  
        if u is None:
            yield fillvalue, v
            v = next(l2, None)
        elif v is None:
            yield u, fillvalue
            u = next(l1, None)
        elif u.get(key) == v.get(key):
            yield u, v
            u = next(l1, None)
            v = next(l2, None)
        elif u.get(key) < v.get(key):
            yield u, fillvalue
            u = next(l1, None)
        else:
            yield fillvalue, v
            v = next(l2, None)

Now if you had the following out of order lists:

l1 = [{"index":1, "b":2}, {"index":2, "b":3}, {"index":3, "green":"eggs"}, 
      {"index":4, "b": 4}]
l2 = [{"index":1, "c":4}, {"index":2, "c":5}, {"index":0, "green": "ham"}, 
      {"index":4, "green": "ham"}]

Using the sortedZipLongest function instead of itertools.zip_longest:

l3 = [{**u, **v} for u, v in sortedZipLongest(l1, l2, key="index", fillvalue={})]
print(l3)
#[{'index': 0, 'green': 'ham'},
# {'index': 1, 'b': 2, 'c': 4},
# {'index': 2, 'b': 3, 'c': 5},
# {'index': 3, 'green': 'eggs'},
# {'index': 4, 'b': 4, 'green': 'ham'}]

Whereas original method would produce the incorrect answer:

l3 = [{**u, **v} for u, v in zip_longest(l1, l2, fillvalue={})]
print(l3)
#[{'index': 1, 'b': 2, 'c': 4},
# {'index': 2, 'b': 3, 'c': 5},
# {'index': 0, 'green': 'ham'},
# {'index': 4, 'b': 4, 'green': 'ham'}]
Answered By: pault
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.