How do I add the contents of an iterable to a set?

Question:

What is the “one […] obvious way” to add all items of an iterable to an existing set?

Asked By: Ian Mackinnon

||

Answers:

Use list comprehension.

Short circuiting the creation of iterable using a list for example 🙂

>>> x = [1, 2, 3, 4]
>>> 
>>> k = x.__iter__()
>>> k
<listiterator object at 0x100517490>
>>> l = [y for y in k]
>>> l
[1, 2, 3, 4]
>>> 
>>> z = Set([1,2])
>>> z.update(l)
>>> z
set([1, 2, 3, 4])
>>> 

[Edit: missed the set part of question]

Answered By: pyfunc

You can add elements of a list to a set like this:

>>> foo = set(range(0, 4))
>>> foo
set([0, 1, 2, 3])
>>> foo.update(range(2, 6))
>>> foo
set([0, 1, 2, 3, 4, 5])
for item in items:
   extant_set.add(item)

For the record, I think the assertion that “There should be one– and preferably only one –obvious way to do it.” is bogus. It makes an assumption that many technical minded people make, that everyone thinks alike. What is obvious to one person is not so obvious to another.

I would argue that my proposed solution is clearly readable, and does what you ask. I don’t believe there are any performance hits involved with it–though I admit I might be missing something. But despite all of that, it might not be obvious and preferable to another developer.

Answered By: jaydel

You can use the set() function to convert an iterable into a set, and then use standard set update operator (|=) to add the unique values from your new set into the existing one.

>>> a = { 1, 2, 3 }
>>> b = ( 3, 4, 5 )
>>> a |= set(b)
>>> a
set([1, 2, 3, 4, 5])
Answered By: gbc

For the benefit of anyone who might believe e.g. that doing aset.add() in a loop would have performance competitive with doing aset.update(), here’s an example of how you can test your beliefs quickly before going public:

>python27python -mtimeit -s"it=xrange(10000);a=set(xrange(100))" "a.update(it)"
1000 loops, best of 3: 294 usec per loop

>python27python -mtimeit -s"it=xrange(10000);a=set(xrange(100))" "for i in it:a.add(i)"
1000 loops, best of 3: 950 usec per loop

>python27python -mtimeit -s"it=xrange(10000);a=set(xrange(100))" "a |= set(it)"
1000 loops, best of 3: 458 usec per loop

>python27python -mtimeit -s"it=xrange(20000);a=set(xrange(100))" "a.update(it)"
1000 loops, best of 3: 598 usec per loop

>python27python -mtimeit -s"it=xrange(20000);a=set(xrange(100))" "for i in it:a.add(i)"
1000 loops, best of 3: 1.89 msec per loop

>python27python -mtimeit -s"it=xrange(20000);a=set(xrange(100))" "a |= set(it)"
1000 loops, best of 3: 891 usec per loop

Looks like the cost per item of the loop approach is over THREE times that of the update approach.

Using |= set() costs about 1.5x what update does but half of what adding each individual item in a loop does.

Answered By: John Machin

Just a quick update, timings using python 3:

#!/usr/local/bin python3
from timeit import Timer

a = set(range(1, 100000))
b = list(range(50000, 150000))

def one_by_one(s, l):
    for i in l:
        s.add(i)    

def cast_to_list_and_back(s, l):
    s = set(list(s) + l)

def update_set(s,l):
    s.update(l)

results are:

one_by_one 10.184448844986036
cast_to_list_and_back 7.969255169969983
update_set 2.212590195937082
Answered By: Daniel Dubovski
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.