Python set iteration time complexity
Question:
What is the time complexity of the following python code?
- create set x
- add n items in x
- remove n items in x
- add 1 item in x
- iterate x m times
import time
def test(n, m):
answer = 0
start = time.time()
x = set([i for i in range(n)])
for i in list(x):
x.remove(i)
x.add(1234)
for i in range(m):
for j in x:
answer+=1
return time.time() - start
print(test(100000,100000))
I thought time complexity is O(m) but it takes very long time. (It took more than 10 seconds in my local environment.)
is time comlexity O(nm)?
Answers:
Set iteration takes time proportional to the size of the underlying hash table. For common usage patterns, the size of the underlying hash table is usually within a small constant factor of the number of elements, but when you start deleting entries, that may no longer be the case.
CPython doesn’t resize sets on remove
. It can resize on add
, but as of the current CPython 3.11.2 implementation, set.add
only resizes when an added element lands in a never-previously-used table cell. If an element lands in a cell another element has previously been deleted from, there is no resize check.
Your x.add(1234)
lands in the cell the old 1234
element was previously deleted from, so it doesn’t trigger a set resize. The removal loop doesn’t trigger a resize either, since set.remove
doesn’t trigger resizes. Thus, your set’s hash table is still sized for the original 100000 elements rather than the 1 remaining element when you start iterating over the set.
What is the time complexity of the following python code?
- create set x
- add n items in x
- remove n items in x
- add 1 item in x
- iterate x m times
import time
def test(n, m):
answer = 0
start = time.time()
x = set([i for i in range(n)])
for i in list(x):
x.remove(i)
x.add(1234)
for i in range(m):
for j in x:
answer+=1
return time.time() - start
print(test(100000,100000))
I thought time complexity is O(m) but it takes very long time. (It took more than 10 seconds in my local environment.)
is time comlexity O(nm)?
Set iteration takes time proportional to the size of the underlying hash table. For common usage patterns, the size of the underlying hash table is usually within a small constant factor of the number of elements, but when you start deleting entries, that may no longer be the case.
CPython doesn’t resize sets on remove
. It can resize on add
, but as of the current CPython 3.11.2 implementation, set.add
only resizes when an added element lands in a never-previously-used table cell. If an element lands in a cell another element has previously been deleted from, there is no resize check.
Your x.add(1234)
lands in the cell the old 1234
element was previously deleted from, so it doesn’t trigger a set resize. The removal loop doesn’t trigger a resize either, since set.remove
doesn’t trigger resizes. Thus, your set’s hash table is still sized for the original 100000 elements rather than the 1 remaining element when you start iterating over the set.