Pythonic way to access arbitrary element from dictionary
Question:
I have a dictionary, full of items. I want to peek at a single, arbitrary item:
print("Amongst our dictionary's items are such diverse elements as: %s" % arb(dictionary))
I don’t care which item. It doesn’t need to be random.
I can think of many ways of implementing this, but they all seem wasteful. I am wondering if any are preferred idioms in Python, or (even better) if I am missing one.
def arb(dictionary):
# Creates an entire list in memory. Could take a while.
return list(dictionary.values())[0]
def arb(dictionary):
# Creates an entire iterator. An improvement.
for item in dictionary.values():
return item
def arb(dictionary):
# No iterator, but writes to the dictionary! Twice!
key, value = dictionary.popitem()
dictionary[key] = value
return value
I’m in a position where the performance isn’t critical enough that this matters (yet), so I can be accused of premature optimization, but I am trying to improve my Python coding style, so if there is an easily understood variant, it would be good to adopt it.
Answers:
Similar to your second solution, but slightly more obvious, in my opinion:
return next(iter(dictionary.values()))
This works in python 2 as well as in python 3, but in python 2 it’s more efficient to do it like this:
return next(dictionary.itervalues())
Why not use random
?
import random
def arb(dictionary):
return random.choice(dictionary.values())
This makes it very clear that the result is meant to be purely arbitrary and not an implementation side-effect. Until performance becomes an actual issue, always go with clarity over speed.
It’s a shame that dict_values don’t support indexing, it’d be nice to be able to pass in the value view instead.
Update: since everyone is so obsessed with performance, the above function takes <120ms to return a random value from a dict of 1 million items. Relying on clear code is not the amazing performance hit it’s being made out to be.
Avoiding the whole values
/itervalues
/viewvalues
mess, this works equally well in Python2 or Python3
dictionary[next(iter(dictionary))]
alternatively if you prefer generator expressions
next(dictionary[x] for x in dictionary)
I believe the question has been significantly answered but hopefully this comparison will shed some light on the clean code vs time trade off:
from timeit import timeit
from random import choice
A = {x:[y for y in range(100)] for x in range(1000)}
def test_pop():
k, v= A.popitem()
A[k] = v
def test_iter(): k = next(A.iterkeys())
def test_list(): k = choice(A.keys())
def test_insert(): A[0] = 0
if __name__ == '__main__':
print('pop', timeit("test_pop()", setup="from __main__ import test_pop", number=10000))
print('iter', timeit("test_iter()", setup="from __main__ import test_iter", number=10000))
print('list', timeit("test_list()", setup="from __main__ import test_list", number=10000))
print('insert', timeit("test_insert()", setup="from __main__ import test_insert", number=10000))
Here are the results:
('pop', 0.0021750926971435547)
('iter', 0.002003908157348633)
('list', 0.047267913818359375)
('insert', 0.0010859966278076172)
It seems that using iterkeys is only marginal faster then poping an item and re-inserting but 10x’s faster then creating the list and choosing a random object from it.
I have a dictionary, full of items. I want to peek at a single, arbitrary item:
print("Amongst our dictionary's items are such diverse elements as: %s" % arb(dictionary))
I don’t care which item. It doesn’t need to be random.
I can think of many ways of implementing this, but they all seem wasteful. I am wondering if any are preferred idioms in Python, or (even better) if I am missing one.
def arb(dictionary):
# Creates an entire list in memory. Could take a while.
return list(dictionary.values())[0]
def arb(dictionary):
# Creates an entire iterator. An improvement.
for item in dictionary.values():
return item
def arb(dictionary):
# No iterator, but writes to the dictionary! Twice!
key, value = dictionary.popitem()
dictionary[key] = value
return value
I’m in a position where the performance isn’t critical enough that this matters (yet), so I can be accused of premature optimization, but I am trying to improve my Python coding style, so if there is an easily understood variant, it would be good to adopt it.
Similar to your second solution, but slightly more obvious, in my opinion:
return next(iter(dictionary.values()))
This works in python 2 as well as in python 3, but in python 2 it’s more efficient to do it like this:
return next(dictionary.itervalues())
Why not use random
?
import random
def arb(dictionary):
return random.choice(dictionary.values())
This makes it very clear that the result is meant to be purely arbitrary and not an implementation side-effect. Until performance becomes an actual issue, always go with clarity over speed.
It’s a shame that dict_values don’t support indexing, it’d be nice to be able to pass in the value view instead.
Update: since everyone is so obsessed with performance, the above function takes <120ms to return a random value from a dict of 1 million items. Relying on clear code is not the amazing performance hit it’s being made out to be.
Avoiding the whole values
/itervalues
/viewvalues
mess, this works equally well in Python2 or Python3
dictionary[next(iter(dictionary))]
alternatively if you prefer generator expressions
next(dictionary[x] for x in dictionary)
I believe the question has been significantly answered but hopefully this comparison will shed some light on the clean code vs time trade off:
from timeit import timeit
from random import choice
A = {x:[y for y in range(100)] for x in range(1000)}
def test_pop():
k, v= A.popitem()
A[k] = v
def test_iter(): k = next(A.iterkeys())
def test_list(): k = choice(A.keys())
def test_insert(): A[0] = 0
if __name__ == '__main__':
print('pop', timeit("test_pop()", setup="from __main__ import test_pop", number=10000))
print('iter', timeit("test_iter()", setup="from __main__ import test_iter", number=10000))
print('list', timeit("test_list()", setup="from __main__ import test_list", number=10000))
print('insert', timeit("test_insert()", setup="from __main__ import test_insert", number=10000))
Here are the results:
('pop', 0.0021750926971435547)
('iter', 0.002003908157348633)
('list', 0.047267913818359375)
('insert', 0.0010859966278076172)
It seems that using iterkeys is only marginal faster then poping an item and re-inserting but 10x’s faster then creating the list and choosing a random object from it.