Counting the number of distinct keys in a dictionary in Python
Question:
I have a a dictionary mapping keywords to the repetition of the keyword, but I only want a list of distinct words so I wanted to count the number of keywords. Is there a way to count the number of keywords or is there another way I should look for distinct words?
Answers:
The number of distinct words (i.e. count of entries in the dictionary) can be found using the len()
function.
> a = {'foo':42, 'bar':69}
> len(a)
2
To get all the distinct words (i.e. the keys), use the .keys()
method.
> list(a.keys())
['foo', 'bar']
len(yourdict.keys())
or just
len(yourdict)
If you like to count unique words in the file, you could just use set
and do like
len(set(open(yourdictfile).read().split()))
If the question is about counting the number of keywords then would recommend something like
def countoccurrences(store, value):
try:
store[value] = store[value] + 1
except KeyError as e:
store[value] = 1
return
in the main function have something that loops through the data and pass the values to countoccurrences function
if __name__ == "__main__":
store = {}
list = ('a', 'a', 'b', 'c', 'c')
for data in list:
countoccurrences(store, data)
for k, v in store.iteritems():
print "Key " + k + " has occurred " + str(v) + " times"
The code outputs
Key a has occurred 2 times
Key c has occurred 2 times
Key b has occurred 1 times
Calling len()
directly on your dictionary works, and is faster than building an iterator, d.keys()
, and calling len()
on it, but the speed of either will negligible in comparison to whatever else your program is doing.
d = {x: x**2 for x in range(1000)}
len(d)
# 1000
len(d.keys())
# 1000
%timeit len(d)
# 41.9 ns ± 0.244 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit len(d.keys())
# 83.3 ns ± 0.41 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Some modifications were made on posted answer UnderWaterKremlin to make it python3 proof. A surprising result below as answer.
System specs:
- python =3.7.4,
- conda = 4.8.0
- 3.6Ghz, 8 core, 16gb.
import timeit
d = {x: x**2 for x in range(1000)}
#print (d)
print (len(d))
# 1000
print (len(d.keys()))
# 1000
print (timeit.timeit('len({x: x**2 for x in range(1000)})', number=100000)) # 1
print (timeit.timeit('len({x: x**2 for x in range(1000)}.keys())', number=100000)) # 2
Result:
1) = 37.0100378
2) = 37.002148899999995
So it seems that len(d.keys())
is currently faster than just using len()
.
In order to count the number of keywords in a dictionary:
def dict_finder(dict_finders):
x=input("Enter the thing you want to find: ")
if x in dict_finders:
print("Element found")
else:
print("Nothing found:")
I have a a dictionary mapping keywords to the repetition of the keyword, but I only want a list of distinct words so I wanted to count the number of keywords. Is there a way to count the number of keywords or is there another way I should look for distinct words?
The number of distinct words (i.e. count of entries in the dictionary) can be found using the len()
function.
> a = {'foo':42, 'bar':69}
> len(a)
2
To get all the distinct words (i.e. the keys), use the .keys()
method.
> list(a.keys())
['foo', 'bar']
len(yourdict.keys())
or just
len(yourdict)
If you like to count unique words in the file, you could just use set
and do like
len(set(open(yourdictfile).read().split()))
If the question is about counting the number of keywords then would recommend something like
def countoccurrences(store, value):
try:
store[value] = store[value] + 1
except KeyError as e:
store[value] = 1
return
in the main function have something that loops through the data and pass the values to countoccurrences function
if __name__ == "__main__":
store = {}
list = ('a', 'a', 'b', 'c', 'c')
for data in list:
countoccurrences(store, data)
for k, v in store.iteritems():
print "Key " + k + " has occurred " + str(v) + " times"
The code outputs
Key a has occurred 2 times
Key c has occurred 2 times
Key b has occurred 1 times
Calling len()
directly on your dictionary works, and is faster than building an iterator, d.keys()
, and calling len()
on it, but the speed of either will negligible in comparison to whatever else your program is doing.
d = {x: x**2 for x in range(1000)}
len(d)
# 1000
len(d.keys())
# 1000
%timeit len(d)
# 41.9 ns ± 0.244 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit len(d.keys())
# 83.3 ns ± 0.41 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Some modifications were made on posted answer UnderWaterKremlin to make it python3 proof. A surprising result below as answer.
System specs:
- python =3.7.4,
- conda = 4.8.0
- 3.6Ghz, 8 core, 16gb.
import timeit
d = {x: x**2 for x in range(1000)}
#print (d)
print (len(d))
# 1000
print (len(d.keys()))
# 1000
print (timeit.timeit('len({x: x**2 for x in range(1000)})', number=100000)) # 1
print (timeit.timeit('len({x: x**2 for x in range(1000)}.keys())', number=100000)) # 2
Result:
1) = 37.0100378
2) = 37.002148899999995
So it seems that len(d.keys())
is currently faster than just using len()
.
In order to count the number of keywords in a dictionary:
def dict_finder(dict_finders):
x=input("Enter the thing you want to find: ")
if x in dict_finders:
print("Element found")
else:
print("Nothing found:")