Is set.remove() causing a big slowdown in my code?

Question:

This is a solution to a leetcode problem. The problem is solved, but I am struggling to understand some weird behavior which is probably python specific. The issue is with the two lines that have a comment.

graph is a hashmap that maps integers to sets. graph[arr[index]] is a set containing integers, which I am removing from one by one: graph[arr[index]].remove(j). I could remove the integers one by one or just do graph[arr[index]] = set() after I am done processing all of them. Initially I removed the integers from the set one by one. This caused my solution to be too slow. This was confusing since removing from a set should be O(1), but maybe the constant factor was too large. I fixed it by doing graph[arr[index]] = set() after the loop. The solution was 50x faster and accepted.

However, even if I leave in this line: graph[arr[index]].remove(j), as long as I have the graph[arr[index]] = set() after the loop, the solution is still fast. What could be the cause of this? My only guess is an interpreter optimization. Also, I tested the slow code with different sized inputs. The time taken seems to scale linearly with the size of the inputs.

from collections import defaultdict, deque
def minJumps(arr: List[int]) -> int:
    graph = defaultdict(set)
    for i, n in enumerate(arr):
        graph[n].add(i)
    queue = deque()
    queue.append(0)
    visited = set([0])
    steps = 0
    while queue:
        l = len(queue)
        for i in range(l):
            index = queue.popleft()
            if index == len(arr)-1:
                return steps
            if index and index-1 not in visited:
                queue.append(index-1)
                visited.add(index-1)
            if index+1 not in visited:
                queue.append(index+1)
                visited.add(index+1)
                
            for j in list(graph[arr[index]]):
                graph[arr[index]].remove(j) #removing from a set should be O(1), but maybe big constant factor?
                if j not in visited:
                    visited.add(j)
                    queue.append(j)
            #graph[arr[index]] = set() #If I uncomment this line then it runs much faster, even if I leave in the previous line
                    
        steps += 1
Asked By: Mustafa

||

Answers:

Removing is fast. The problem is that set.remove never triggers a resize.

If you empty a set by remove-ing all elements one by one, the underlying hash table is still sized for the number of elements it had before you started removing elements. When you then loop over that set again later, the set’s iterator ends up wasting a lot of time traversing the giant, empty hash table entry by entry just to find no elements.

Answered By: user2357112
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.