Count dictionary keys and argmax in a list of list

Question:

Given:

import numpy as np
list_of_list = [
    ['ccc', 'cccc', 'b', 'c', 'b'],
    ['ab', 'b', 'b', 'aa'], 
    ['c', 'b', 'c', 'c', 'b', 'c'],
    ['bb', 'd', 'c'],
]
my_dict = {key: None for key in 'abcde'} 

list_of_list is simplified in this test example but it actually is a list of vocabularies in a list:

list_of_list = [ 
    ['word1', 'word2', ... , 'wordN'],
    ['word1', 'word2', ... , 'wordM'], 
    ['word1', 'word2', ... , 'wordK'], 
    ...
]

Goal:

I’d like to get updated dictionary with pattern: "key": [index_of_max_occurrence, max_occurrence] given the components of list_of_list.

My inefficient solution:

The following code snippet, using for loop, works fine with quite small dictionary and list of list. However, for bigger sizes, it obviously turns out to be very time consuming and inefficient:

for k in my_dict:
    counters = list()
    for lst in list_of_list:
        counters.append( lst.count(k) )
    if any(counters):
        my_dict[k] = [ np.argmax(counters) , max(counters) ]
print(my_dict) # {'a': None, 'b': [0, 2], 'c': [2, 4], 'd': [3, 1], 'e': None}

Is there any better robust solution that I could speed up my program?

Asked By: Farid Alijani

||

Answers:

You can transform the list_of_lists to list_of_dicts (counters) and reduce the complexity. Alson, the np.argmax isn’t necessary, use enumerate/max:

from collections import Counter

list_of_dicts = [
    Counter(l)
    for l in list_of_list
]

for k in my_dict:
    i, m = max(enumerate(list_of_dicts), key=lambda d: d[1].get(k, 0))
    if k in m:
        my_dict[k] = [i, list_of_dicts[i][k]]

print(my_dict)

Prints:

{'a': None, 'b': [0, 2], 'c': [2, 4], 'd': [3, 1], 'e': None}
Answered By: Andrej Kesely
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.