How to write a python3 function that merges two lists of sets so that the output list has sets of elements that were present in both input sets?

Question:

I have two lists of sets, let’s say:
[{1, 2, 3}, {4, 5}, {6, 7}] and [{1, 2}, {3, 4}, {5, 6, 7}]

No set in the list has the same element and the sum of all sets in both lists is the same.
The function should check if the sets in both lists had the same elements. If there were some differences, put them in another set.

So above example should return:
[{1, 2}, {3}, {4}, {5}, {6, 7}]

I work on large sets so I would need this function to be as effective as possible.

Here is the example code and how I would like it to work:

def mergeSets(x, y):
    out = set()
    for i in x:
        out = out.union(i)
        # this allows me to get the set of all elements but here where my mind stops working
        # the problem sounds simple but thinking hours I can not think of good algorythm for this       issue :(
        # I found set.intersection() function but it works on single sets only, not lists of sets
    return out


x = mergeSets([{1, 2, 3}, {4, 5}, {6, 7}], [{1, 2}, {3, 4}, {5, 6, 7}])
print(x)
# [{1, 2}, {3}, {4}, {5}, {6, 7}]
x = mergeSets([{1, 2}, {3, 4, 5, 6, 7}, {8}], [{1}, {2, 3, 4}, {5, 6, 7, 8}])
print(x)
# [{1}, {2}, {3, 4}, {5, 6, 7}, {8}]

EDIT: the data doesn’t have to be sorted and may be even of different types than integer

EDIT2: the input lists don’t have to be sorted so sets may appear in random order

Asked By: Amae Saeki

||

Answers:

.intersection() works if you iterate over each list, like so:

def merge_sets(lst1, lst2):
        
    final = []

    for i in lst1:
        for j in lst2:
            if len(i.intersection(j)) > 0:
                final.append(i.intersection(j))
                
    return final

Using the same input lists you should see the following outputs:

>>> x = merge_sets([{1, 2, 3}, {4, 5}, {6, 7}], [{1, 2}, {3, 4}, {5, 6, 7}])
>>> print(x)
[{1, 2}, {3}, {4}, {5}, {6, 7}]

>>> y = merge_sets([{1, 2}, {3, 4, 5, 6, 7}, {8}], [{1}, {2, 3, 4}, {5, 6, 7, 8}])
>>> print(y)
[{1}, {2}, {3, 4}, {5, 6, 7}, {8}]

This runs in O(n2) time.

You could re-write this using a list comprehension and get the same results:

def merge_sets(lst1, lst2):
        
    final = [i.intersection(j) for i in lst1 for j in lst2 if len(i.intersection(j)) > 0]
                
    return final

To improve the time complexity you could try an approach that does away with the nested for loop. But for now, this works.

Answered By: David Smith

Given that each value occurs in exactly two sets (one per input list), you could collect the index pairs for each value, where an index pair gives an indication in which two sets (at which indexes in both lists) the value occurs.

Unique pairs indicate unique sets in the output, so a dictionary of such pairs can serve to populate the result:

from collections import defaultdict

def merge_sets(lista, listb):
    index_in_a = {
        val: idx
        for idx, elem in enumerate(lista) for val in elem
    }
    set_by_key = defaultdict(set)
    for idx, elem in enumerate(listb):
        for val in elem:
            set_by_key[(index_in_a[val], idx)].add(val)
    return list(set_by_key.values())

This looks O(n) to me.

NB: since the order of iteration over a set is not defined, the order of the output can look a bit mingled, but I assume the order of where sets appear in the output is not significant.

Answered By: trincot