Accessing dictionary items by position in Python 3.6+ efficiently
Question:
I understand dictionaries are insertion ordered in Python 3.6+, as an implementation detail in 3.6 and official in 3.7+.
Given they are ordered, it seems strange that no methods exist to retrieve the ith item of a dictionary by insertion order. The only solutions available appear to have O(n) complexity, either:
- Convert to a list via an O(n) process and then use
list.__getitem__
.
enumerate
dictionary items in a loop and return the value when the desired index is reached. Again, with O(n) time complexity.
Since getting an item from a list
has O(1) complexity, is there a way to achieve the same complexity with dictionaries? Either with regular dict
or collections.OrderedDict
would work.
If it’s not possible, is there a structural reason preventing such a method, or is this just a feature which has not yet been considered / implemented?
Answers:
For an OrderedDict
it’s inherently O(n)
because the ordering is recorded in a linked list.
For the builtin dict, there’s a vector (a contiguous array) rather than a linked list, but pretty much the same thing in the end: the vector contains a few kind of “dummies”, special internal values that mean “no key has been stored here yet” or “a key used to be stored here but no longer”. That makes, e.g., deleting a key extremely cheap (just overwrite the key with a dummy value).
But without adding auxiliary data structures on top of that, there’s no way to skip over the dummies without marching over them one at a time. Because Python uses a form of open addressing for collision resolution, and keeps the load factor under 2/3, at least a third of the vector’s entries are dummies. the_vector[i]
can be accessed in O(1)
time, but really has no predictable relation to the i’th non-dummy entry.
As per @TimPeters’ answer, there are structural reasons why you cannot access dictionary items by position in O(1) time.
It’s worth considering the alternatives if you are looking for O(1) lookup by key or position. There are 3rd party libraries such as NumPy / Pandas which offer such functionality, efficient especially for numeric arrays where pointers are not required.
With Pandas, you can construct a “dictionary-like” series with unique labels offering O(1) lookup by “label” or position. What you sacrifice is performance when deleting a label, which incurs O(n) cost, much like list
.
import pandas as pd
s = pd.Series(list(range(n)))
# O(n) item deletion
del s[i]
s.drop(i)
s.pop(i)
# O(1) lookup by label
s.loc[i]
s.at[i]
s.get(i)
s[i]
# O(1) lookup by position
s.iloc[i]
s.iat[i]
pd.Series
is by no means a drop-in replacement for dict
. For example, duplicate keys are not prevented and will cause issues if the series is used primarily as a mapping. However, where data is stored in a contiguous memory block, as in the example above, you may see significant performance improvements.
See also:
I understand dictionaries are insertion ordered in Python 3.6+, as an implementation detail in 3.6 and official in 3.7+.
Given they are ordered, it seems strange that no methods exist to retrieve the ith item of a dictionary by insertion order. The only solutions available appear to have O(n) complexity, either:
- Convert to a list via an O(n) process and then use
list.__getitem__
. enumerate
dictionary items in a loop and return the value when the desired index is reached. Again, with O(n) time complexity.
Since getting an item from a list
has O(1) complexity, is there a way to achieve the same complexity with dictionaries? Either with regular dict
or collections.OrderedDict
would work.
If it’s not possible, is there a structural reason preventing such a method, or is this just a feature which has not yet been considered / implemented?
For an OrderedDict
it’s inherently O(n)
because the ordering is recorded in a linked list.
For the builtin dict, there’s a vector (a contiguous array) rather than a linked list, but pretty much the same thing in the end: the vector contains a few kind of “dummies”, special internal values that mean “no key has been stored here yet” or “a key used to be stored here but no longer”. That makes, e.g., deleting a key extremely cheap (just overwrite the key with a dummy value).
But without adding auxiliary data structures on top of that, there’s no way to skip over the dummies without marching over them one at a time. Because Python uses a form of open addressing for collision resolution, and keeps the load factor under 2/3, at least a third of the vector’s entries are dummies. the_vector[i]
can be accessed in O(1)
time, but really has no predictable relation to the i’th non-dummy entry.
As per @TimPeters’ answer, there are structural reasons why you cannot access dictionary items by position in O(1) time.
It’s worth considering the alternatives if you are looking for O(1) lookup by key or position. There are 3rd party libraries such as NumPy / Pandas which offer such functionality, efficient especially for numeric arrays where pointers are not required.
With Pandas, you can construct a “dictionary-like” series with unique labels offering O(1) lookup by “label” or position. What you sacrifice is performance when deleting a label, which incurs O(n) cost, much like list
.
import pandas as pd
s = pd.Series(list(range(n)))
# O(n) item deletion
del s[i]
s.drop(i)
s.pop(i)
# O(1) lookup by label
s.loc[i]
s.at[i]
s.get(i)
s[i]
# O(1) lookup by position
s.iloc[i]
s.iat[i]
pd.Series
is by no means a drop-in replacement for dict
. For example, duplicate keys are not prevented and will cause issues if the series is used primarily as a mapping. However, where data is stored in a contiguous memory block, as in the example above, you may see significant performance improvements.
See also: