Return dict of most common values from a list of tuples of a edge network, allowing for ties
Question:
I need to transform a list of thousands of tuples that looks essentially like this:
[('2975', '6384'), ('2975', '530'), ('7443', '1107983'), ('3534', '530')]
Into a dict where the most common entry is returned along with a count for its frequency. The function also needs to allow for ties when more than one value is the maximum and report both.
So in the above scenario, I would get back:
{'2975': 2, '530': 2}
My current implementation
from collections import Counter
nodes = [('2975', '6384'), ('2975', '530'), ('7443', '1107983'), ('3534', '530')]
highest_nodes = {}
# converting nodes
data = Counter(list(sum(nodes, ()))).most_common()
val = data[0][1] # get the value of n-1th item
for a, b in list(takewhile(lambda x: x[1] >= val, data)):
highest_nodes.setdefault(a, []).append(b)
# This is returning values as a list containing the item, need to extract the int from it
return highest_nodes
Any ideas how I can get this done?
Thank you!
Answers:
from collections import Counter
nodes = [('2975', '6384'), ('2975', '530'), ('7443', '1107983'), ('3534', '530')]
node_1 = Counter([i[0] for i in nodes]).most_common(1)[0]
node_2 = Counter([i[1] for i in nodes]).most_common(1)[0]
return {node_1[0]: node_1[1], node_2[0]: node_2[1]}
Use dict()
then iterate it to get max
value of data
dict:
from collections import Counter
nodes = [('2975', '6384'), ('2975', '530'), ('7443', '1107983'), ('3534', '530')]
highest_nodes = {}
# converting nodes
data = dict(Counter(list(sum(nodes, ()))).most_common())
for a, b in data.items():
if b == max(data.values()):
highest_nodes.setdefault(a, b)
# {'2975': 2, '530': 2}
The problem you face is that most_common will either show you X amount of entries or all of them, there is no way to show all top entries if they have the same amount of occurrences using that function alone.
Note: in my example ive used the variable "d" instead of "nodes"
The method below, flattens the list of tuples into a single list,
[item for sublist in d for item in sublist]
converts to a counter to find each occurrence and then to an ordered list in order of occurrences using most_common.
Counter(flattened list from above code]).most_common()
Finally it uses a slice to display all indices of the list which all have the top count of occurrences and converts back to a dict.
[:d.values().count(max(d.values()))]
top_hits = dict(top_hits)
The complete code:
from collections import Counter
d = [('2975', '6384'), ('2975', '530'), ('7443', '1107983'), ('3534', '530')]
counter = Counter([item for sublist in d for item in sublist])
top_hits = counter.most_common()[:counter.values().count(max(counter.values()))]
top_hits = dict(top_hits)
print(top_hits)
{'2975': 2, '530': 2}
I need to transform a list of thousands of tuples that looks essentially like this:
[('2975', '6384'), ('2975', '530'), ('7443', '1107983'), ('3534', '530')]
Into a dict where the most common entry is returned along with a count for its frequency. The function also needs to allow for ties when more than one value is the maximum and report both.
So in the above scenario, I would get back:
{'2975': 2, '530': 2}
My current implementation
from collections import Counter
nodes = [('2975', '6384'), ('2975', '530'), ('7443', '1107983'), ('3534', '530')]
highest_nodes = {}
# converting nodes
data = Counter(list(sum(nodes, ()))).most_common()
val = data[0][1] # get the value of n-1th item
for a, b in list(takewhile(lambda x: x[1] >= val, data)):
highest_nodes.setdefault(a, []).append(b)
# This is returning values as a list containing the item, need to extract the int from it
return highest_nodes
Any ideas how I can get this done?
Thank you!
from collections import Counter
nodes = [('2975', '6384'), ('2975', '530'), ('7443', '1107983'), ('3534', '530')]
node_1 = Counter([i[0] for i in nodes]).most_common(1)[0]
node_2 = Counter([i[1] for i in nodes]).most_common(1)[0]
return {node_1[0]: node_1[1], node_2[0]: node_2[1]}
Use dict()
then iterate it to get max
value of data
dict:
from collections import Counter
nodes = [('2975', '6384'), ('2975', '530'), ('7443', '1107983'), ('3534', '530')]
highest_nodes = {}
# converting nodes
data = dict(Counter(list(sum(nodes, ()))).most_common())
for a, b in data.items():
if b == max(data.values()):
highest_nodes.setdefault(a, b)
# {'2975': 2, '530': 2}
The problem you face is that most_common will either show you X amount of entries or all of them, there is no way to show all top entries if they have the same amount of occurrences using that function alone.
Note: in my example ive used the variable "d" instead of "nodes"
The method below, flattens the list of tuples into a single list,
[item for sublist in d for item in sublist]
converts to a counter to find each occurrence and then to an ordered list in order of occurrences using most_common.
Counter(flattened list from above code]).most_common()
Finally it uses a slice to display all indices of the list which all have the top count of occurrences and converts back to a dict.
[:d.values().count(max(d.values()))]
top_hits = dict(top_hits)
The complete code:
from collections import Counter
d = [('2975', '6384'), ('2975', '530'), ('7443', '1107983'), ('3534', '530')]
counter = Counter([item for sublist in d for item in sublist])
top_hits = counter.most_common()[:counter.values().count(max(counter.values()))]
top_hits = dict(top_hits)
print(top_hits)
{'2975': 2, '530': 2}