Return dict of most common values from a list of tuples of a edge network, allowing for ties

Question:

I need to transform a list of thousands of tuples that looks essentially like this:

[('2975', '6384'), ('2975', '530'), ('7443', '1107983'), ('3534', '530')]

Into a dict where the most common entry is returned along with a count for its frequency. The function also needs to allow for ties when more than one value is the maximum and report both.

So in the above scenario, I would get back:

{'2975': 2, '530': 2}

My current implementation

from collections import Counter

nodes = [('2975', '6384'), ('2975', '530'), ('7443', '1107983'), ('3534', '530')]

highest_nodes = {}

# converting nodes
data = Counter(list(sum(nodes, ()))).most_common()
val = data[0][1]  # get the value of n-1th item
for a, b in list(takewhile(lambda x: x[1] >= val, data)):
    highest_nodes.setdefault(a, []).append(b)
# This is returning values as a list containing the item, need to extract the int from it
return highest_nodes

Any ideas how I can get this done?

Thank you!

Asked By: Mars

||

Answers:

from collections import Counter

nodes = [('2975', '6384'), ('2975', '530'), ('7443', '1107983'), ('3534', '530')]

node_1 = Counter([i[0] for i in nodes]).most_common(1)[0]
node_2 = Counter([i[1] for i in nodes]).most_common(1)[0]

return {node_1[0]: node_1[1], node_2[0]: node_2[1]}
Answered By: walker

Use dict() then iterate it to get max value of data dict:

from collections import Counter

nodes = [('2975', '6384'), ('2975', '530'), ('7443', '1107983'), ('3534', '530')]

highest_nodes = {}

# converting nodes
data = dict(Counter(list(sum(nodes, ()))).most_common())

for a, b in data.items():
    if b == max(data.values()):
        highest_nodes.setdefault(a, b)

# {'2975': 2, '530': 2}
Answered By: Arifa Chan

The problem you face is that most_common will either show you X amount of entries or all of them, there is no way to show all top entries if they have the same amount of occurrences using that function alone.

Note: in my example ive used the variable "d" instead of "nodes"

The method below, flattens the list of tuples into a single list,

[item for sublist in d for item in sublist]

converts to a counter to find each occurrence and then to an ordered list in order of occurrences using most_common.

Counter(flattened list from above code]).most_common()

Finally it uses a slice to display all indices of the list which all have the top count of occurrences and converts back to a dict.

[:d.values().count(max(d.values()))]


top_hits = dict(top_hits)

The complete code:

from collections import Counter 
d = [('2975', '6384'), ('2975', '530'), ('7443', '1107983'), ('3534', '530')]
counter =   Counter([item for sublist in d for item in sublist])
top_hits = counter.most_common()[:counter.values().count(max(counter.values()))]
top_hits = dict(top_hits)
print(top_hits)
{'2975': 2, '530': 2}
Answered By: tomgalpin
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.