turn list to dictionary where single key stores values which save index of repeated element
Question:
I want to turn a list with repeated string like
["ask","a","public","question","ask","a","public","question"]
And the output dictionary should has element of list as key and the occurrence indexes as values.
{"ask":[0,4],"a":[1,5],"public":[2,6],"question";[3,7]}
Any hint?
I actually dealing with a bigram perplexity of a corpus, where I have already get the total occurrence of bigram words, i.e., count(B|A), but now I need to get the total occurrence of count(A), where count(A), should be all occurrences of any two words combination start from A. I took the bigram dictionary keys as list and change it to contains only the first words list such as
[['You', 'will'], ['will', 'face'], ['face', 'many'], ['many', 'defeats']
to
['You', 'will', 'face', 'many']
, So I need to calculate all occurrences of each words one by one in that bigram dictionary. I tried several data structures like list, dict, and defaultdict, but they all took so long. I just want to find another datastructure that can deal fastly
Answers:
You can iterate the list using enumerate
then for each value in the list, add/update the index value list in the dictionary.
lst = ["ask","a","public","question","ask","a","public","question"]
out = {}
for i,value in enumerate(lst):
out[value] = out.get(value, []) + [i]
# out
{'ask': [0, 4], 'a': [1, 5], 'public': [2, 6], 'question': [3, 7]}
There are various ways to do this.
This one uses defaultdict
.
from collections import defaultdict
result = defaultdict(list)
mylst = ["ask", "a", "public", "question", "ask", "a", "public", "question"]
for index, item in enumerate(mylst):
result[item].append(index)
print(dict(result))
Another way is to use dict
setdefault
method.
result = {}
mylst = ["ask", "a", "public", "question", "ask", "a", "public", "question"]
for index, item in enumerate(mylst):
result.setdefault(item, []).append(index)
print(result)
Another one would be to use try-except with a dictionary.
I want to turn a list with repeated string like
["ask","a","public","question","ask","a","public","question"]
And the output dictionary should has element of list as key and the occurrence indexes as values.
{"ask":[0,4],"a":[1,5],"public":[2,6],"question";[3,7]}
Any hint?
I actually dealing with a bigram perplexity of a corpus, where I have already get the total occurrence of bigram words, i.e., count(B|A), but now I need to get the total occurrence of count(A), where count(A), should be all occurrences of any two words combination start from A. I took the bigram dictionary keys as list and change it to contains only the first words list such as
[['You', 'will'], ['will', 'face'], ['face', 'many'], ['many', 'defeats']
to
['You', 'will', 'face', 'many']
, So I need to calculate all occurrences of each words one by one in that bigram dictionary. I tried several data structures like list, dict, and defaultdict, but they all took so long. I just want to find another datastructure that can deal fastly
You can iterate the list using enumerate
then for each value in the list, add/update the index value list in the dictionary.
lst = ["ask","a","public","question","ask","a","public","question"]
out = {}
for i,value in enumerate(lst):
out[value] = out.get(value, []) + [i]
# out
{'ask': [0, 4], 'a': [1, 5], 'public': [2, 6], 'question': [3, 7]}
There are various ways to do this.
This one uses defaultdict
.
from collections import defaultdict
result = defaultdict(list)
mylst = ["ask", "a", "public", "question", "ask", "a", "public", "question"]
for index, item in enumerate(mylst):
result[item].append(index)
print(dict(result))
Another way is to use dict
setdefault
method.
result = {}
mylst = ["ask", "a", "public", "question", "ask", "a", "public", "question"]
for index, item in enumerate(mylst):
result.setdefault(item, []).append(index)
print(result)
Another one would be to use try-except with a dictionary.