Better/Faster to Loop through set or list?

Question:

If I have a python list that is has many duplicates, and I want to iterate through each item, but not through the duplicates, is it best to use a set (as in set(mylist), or find another way to create a list without duplicates? I was thinking of just looping through the list and checking for duplicates but I figured that’s what set() does when it’s initialized.

So if mylist = [3,1,5,2,4,4,1,4,2,5,1,3] and I really just want to loop through [1,2,3,4,5] (order doesn’t matter), should I use set(mylist) or something else?

An alternative is possible in the last example, since the list contains every integer between its min and max value, I could loop through range(min(mylist),max(mylist)) or through set(mylist). Should I generally try to avoid using set in this case? Also, would finding the min and max be slower than just creating the set?


In the case in the last example, the set is faster:

from numpy.random import random_integers
ids = random_integers(1e3,size=1e6)

def set_loop(mylist):
    idlist = []
    for id in set(mylist):
        idlist.append(id)
    return idlist

def list_loop(mylist):
    idlist = []
    for id in range(min(mylist),max(mylist)):
        idlist.append(id)
    return idlist

%timeit set_loop(ids)
#1 loops, best of 3: 232 ms per loop

%timeit list_loop(ids)
#1 loops, best of 3: 408 ms per loop
Asked By: askewchan

||

Answers:

Just use a set. Its semantics are exactly what you want: a collection of unique items.

Technically you’ll be iterating through the list twice: once to create the set, once for your actual loop. But you’d be doing just as much work or more with any other approach.

Answered By: Eevee

For simplicity’s sake: newList = list(set(oldList))

But there are better options out there if you’d like to get speed/ordering/optimization instead: http://www.peterbe.com/plog/uniqifiers-benchmark

Answered By: GordonsBeard

set is what you want, so you should use set. Trying to be clever introduces subtle bugs like forgetting to add one tomax(mylist)! Code defensively. Worry about what’s faster when you determine that it is too slow.

range(min(mylist), max(mylist) + 1)  # <-- don't forget to add 1
Answered By: John La Rooy

While a set may be what you want structure-wise, the question is what is faster. A list is faster. Your example code doesn’t accurately compare set vs list because you’re converting from a list to a set in set_loop, and then you’re creating the list you’ll be looping through in list_loop. The set and list you iterate through should be constructed and in memory ahead of time, and simply looped through to see which data structure is faster at iterating:

ids_list = range(1000000)
ids_set = set(ids)
def f(x):
    for i in x:
         pass

%timeit f(ids_set)
#1 loops, best of 3: 214 ms per loop
%timeit f(ids_list)
#1 loops, best of 3: 176 ms per loop
Answered By: hamx0r

I the list is vary large looping two time over it will take a lot of time and more in the second time you are looping a set not a list and as we know iterating over a set is slower than list.

i think you need the power of generator and set.

def first_test():

    def loop_one_time(my_list):
        # create a set to keep the items.
        iterated_items = set()
        # as we know iterating over list is faster then list.
        for value in my_list: 
            # as we know checking if element exist in set is very fast not
            # metter the size of the set.
            if value not in iterated_items:  
                iterated_items.add(value) # add this item to list
                yield value


    mylist = [3,1,5,2,4,4,1,4,2,5,1,3]

    for v in loop_one_time(mylist):pass



def second_test():
    mylist = [3,1,5,2,4,4,1,4,2,5,1,3]
    s = set(mylist)
    for v in s:pass


import timeit

print(timeit.timeit('first_test()', setup='from __main__ import first_test', number=10000))
print(timeit.timeit('second_test()', setup='from __main__ import second_test', number=10000))

out put:

   0.024003583388435043
   0.010424674188938422

Note: this technique order is guaranteed

Answered By: Charif DZ
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.