Getting Unique Values and their Following Pairs from a List of Tuples

Question:

I have a list of tuples like this:

[
    ('a', 'AA'), # pair 1

    ('d', 'AA'), # pair 2
    ('d', 'a'),  # pair 3
    ('d', 'EE'), # pair 4

    ('b', 'BB'), # pair 5
    ('b', 'CC'), # pair 6
    ('b', 'DD'), # pair 7

    ('c', 'BB'), # pair 8
    ('c', 'CC'), # pair 9
    ('c', 'DD'), # pair 10

    ('c', 'b'),  # pair 11

    ('d', 'FF'), # pair 12

]

Each tuple in the list above shows a similar pair of items (or duplicate items). I need to create a dictionary in which keys will be one of the unique item from the tuples and values will be lists filled with all the other items that the key occurred in conjunction with. For example, ‘a’ is similar to ‘AA'(pair 1), which in turn is similar to ‘d'(pair 2) and ‘d’ is similar to ‘EE’ and ‘FF’ (pairs 4 and 12). same is the case with other items.

My expected output is:

{'a':['AA', 'd', 'EE', 'FF'], 'b':['BB', 'CC', 'DD', 'c']}

According to the tuples, ['a', 'AA', 'd', 'EE', 'FF'] are similar; so, any one of them can be the key, while the remaining items will be it’s values. so, output can also be: {'AA':['a', 'd', 'EE', 'FF'], 'c':['BB', 'CC', 'DD', 'b']}. So, key of the output dict can be anything from the duplicate pairs.

How do I do this for a list with thousands of such tuples in a list?

Answers:

You can use 3 dictionaries, one for the output (out), one to track the seen values (seen), and one to map the equal keys (mapper):

out = {}
seen = {}
mapper = {}

for a, b in l:
    if b in seen:
        out.setdefault(seen[b], []).append(a)
        mapper[a] = seen[b]
    else:
        out.setdefault(mapper.setdefault(a, a), []).append(b)
        seen[b] = a

# remove duplicates
out = {k: list(dict.fromkeys(x for x in v if x!=k))
       for k, v in out.items()}

Output:

{'a': ['AA', 'd', 'EE', 'FF'],
 'b': ['BB', 'CC', 'DD', 'c']}

graph approach

Or you might want to approach this using a graph:

import networkx as nx

G = nx.from_edgelist(l)

out = {(x:=list(g))[0]: x[1:] for g in nx.connected_components(G)}

Output:

{'EE': ['d', 'a', 'FF', 'AA'],
 'CC': ['b', 'c', 'BB', 'DD']}

Graph:

enter image description here

Answered By: mozway

As said in the comment, you seem to want to find the equivalence classes on the given relationship.

ecs = []
for a, b in data:
    a_ec = next((ec for ec in ecs if a in ec), None)
    b_ec = next((ec for ec in ecs if b in ec), None)
    if a_ec:
        if b_ec:
            # Found equivalence classes for both elements, everything is okay
            if a_ec is not b_ec:
                # We only need one of them though
                ecs.remove(b_ec)
                a_ec.update(b_ec)
        else:
            # Add the new element to the found equivalence class       
            a_ec.add(b)
    else:              
        if b_ec:
            # Add the new element to the found equivalence class
            b_ec.add(a)
        else:                                                   
            # First time we see either of these: make a new equivalence class 
            ecs.append({a, b})

# Extract a representative element and construct a dictionary
out = {
    ec.pop(): ec
    for ec in ecs
}
Answered By: Amadan
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.