turn list to dictionary where single key stores values which save index of repeated element


I want to turn a list with repeated string like


And the output dictionary should has element of list as key and the occurrence indexes as values.


Any hint?
I actually dealing with a bigram perplexity of a corpus, where I have already get the total occurrence of bigram words, i.e., count(B|A), but now I need to get the total occurrence of count(A), where count(A), should be all occurrences of any two words combination start from A. I took the bigram dictionary keys as list and change it to contains only the first words list such as

[['You', 'will'], ['will', 'face'], ['face', 'many'], ['many', 'defeats']


['You', 'will', 'face', 'many']

, So I need to calculate all occurrences of each words one by one in that bigram dictionary. I tried several data structures like list, dict, and defaultdict, but they all took so long. I just want to find another datastructure that can deal fastly

Asked By: Qqqq



You can iterate the list using enumerate then for each value in the list, add/update the index value list in the dictionary.

lst = ["ask","a","public","question","ask","a","public","question"]
out = {}
for i,value in enumerate(lst):
    out[value] = out.get(value, []) + [i]
# out
{'ask': [0, 4], 'a': [1, 5], 'public': [2, 6], 'question': [3, 7]}
Answered By: ThePyGuy

There are various ways to do this.

This one uses defaultdict.

from collections import defaultdict
result = defaultdict(list)

mylst = ["ask", "a", "public", "question", "ask", "a", "public", "question"]

for index, item in enumerate(mylst):


Another way is to use dict setdefault method.

result = {}
mylst = ["ask", "a", "public", "question", "ask", "a", "public", "question"]

for index, item in enumerate(mylst):
    result.setdefault(item, []).append(index)


Another one would be to use try-except with a dictionary.

Answered By: Vishal Singh
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.