Count dictionary keys and argmax in a list of list
Question:
Given:
import numpy as np
list_of_list = [
['ccc', 'cccc', 'b', 'c', 'b'],
['ab', 'b', 'b', 'aa'],
['c', 'b', 'c', 'c', 'b', 'c'],
['bb', 'd', 'c'],
]
my_dict = {key: None for key in 'abcde'}
list_of_list
is simplified in this test example but it actually is a list of vocabularies in a list:
list_of_list = [
['word1', 'word2', ... , 'wordN'],
['word1', 'word2', ... , 'wordM'],
['word1', 'word2', ... , 'wordK'],
...
]
Goal:
I’d like to get updated dictionary with pattern: "key": [index_of_max_occurrence, max_occurrence]
given the components of list_of_list
.
My inefficient solution:
The following code snippet, using for loop
, works fine with quite small dictionary and list of list. However, for bigger sizes, it obviously turns out to be very time consuming and inefficient:
for k in my_dict:
counters = list()
for lst in list_of_list:
counters.append( lst.count(k) )
if any(counters):
my_dict[k] = [ np.argmax(counters) , max(counters) ]
print(my_dict) # {'a': None, 'b': [0, 2], 'c': [2, 4], 'd': [3, 1], 'e': None}
Is there any better robust solution that I could speed up my program?
Answers:
You can transform the list_of_lists
to list_of_dicts
(counters) and reduce the complexity. Alson, the np.argmax
isn’t necessary, use enumerate
/max
:
from collections import Counter
list_of_dicts = [
Counter(l)
for l in list_of_list
]
for k in my_dict:
i, m = max(enumerate(list_of_dicts), key=lambda d: d[1].get(k, 0))
if k in m:
my_dict[k] = [i, list_of_dicts[i][k]]
print(my_dict)
Prints:
{'a': None, 'b': [0, 2], 'c': [2, 4], 'd': [3, 1], 'e': None}
Given:
import numpy as np
list_of_list = [
['ccc', 'cccc', 'b', 'c', 'b'],
['ab', 'b', 'b', 'aa'],
['c', 'b', 'c', 'c', 'b', 'c'],
['bb', 'd', 'c'],
]
my_dict = {key: None for key in 'abcde'}
list_of_list
is simplified in this test example but it actually is a list of vocabularies in a list:
list_of_list = [
['word1', 'word2', ... , 'wordN'],
['word1', 'word2', ... , 'wordM'],
['word1', 'word2', ... , 'wordK'],
...
]
Goal:
I’d like to get updated dictionary with pattern: "key": [index_of_max_occurrence, max_occurrence]
given the components of list_of_list
.
My inefficient solution:
The following code snippet, using for loop
, works fine with quite small dictionary and list of list. However, for bigger sizes, it obviously turns out to be very time consuming and inefficient:
for k in my_dict:
counters = list()
for lst in list_of_list:
counters.append( lst.count(k) )
if any(counters):
my_dict[k] = [ np.argmax(counters) , max(counters) ]
print(my_dict) # {'a': None, 'b': [0, 2], 'c': [2, 4], 'd': [3, 1], 'e': None}
Is there any better robust solution that I could speed up my program?
You can transform the list_of_lists
to list_of_dicts
(counters) and reduce the complexity. Alson, the np.argmax
isn’t necessary, use enumerate
/max
:
from collections import Counter
list_of_dicts = [
Counter(l)
for l in list_of_list
]
for k in my_dict:
i, m = max(enumerate(list_of_dicts), key=lambda d: d[1].get(k, 0))
if k in m:
my_dict[k] = [i, list_of_dicts[i][k]]
print(my_dict)
Prints:
{'a': None, 'b': [0, 2], 'c': [2, 4], 'd': [3, 1], 'e': None}