Python set iteration time complexity

Question:

What is the time complexity of the following python code?

  1. create set x
  2. add n items in x
  3. remove n items in x
  4. add 1 item in x
  5. iterate x m times
import time

def test(n, m):
    answer = 0
    start = time.time()
    x = set([i for i in range(n)])
    for i in list(x):
        x.remove(i)
    x.add(1234)
    for i in range(m):
        for j in x:
            answer+=1
    return time.time() - start

print(test(100000,100000))

I thought time complexity is O(m) but it takes very long time. (It took more than 10 seconds in my local environment.)
is time comlexity O(nm)?

Asked By: 서태영

||

Answers:

Set iteration takes time proportional to the size of the underlying hash table. For common usage patterns, the size of the underlying hash table is usually within a small constant factor of the number of elements, but when you start deleting entries, that may no longer be the case.

CPython doesn’t resize sets on remove. It can resize on add, but as of the current CPython 3.11.2 implementation, set.add only resizes when an added element lands in a never-previously-used table cell. If an element lands in a cell another element has previously been deleted from, there is no resize check.

Your x.add(1234) lands in the cell the old 1234 element was previously deleted from, so it doesn’t trigger a set resize. The removal loop doesn’t trigger a resize either, since set.remove doesn’t trigger resizes. Thus, your set’s hash table is still sized for the original 100000 elements rather than the 1 remaining element when you start iterating over the set.

Answered By: user2357112
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.