# What makes sets faster than lists?

## Question:

The python wiki says: “Membership testing with sets and dictionaries is much faster, O(1), than searching sequences, O(n). When testing “a in b”, b should be a set or dictionary instead of a list or tuple.”

I’ve been using sets in place of lists whenever speed is important in my code, but lately I’ve been wondering why sets are so much faster than lists. Could anyone explain, or point me to a source that would explain, what exactly is going on behind the scenes in python to make sets faster?

## Answers:

Python uses hashtables, which have O(1) lookup.

I think you need to take a good look at a book on data structures. Basically, Python lists are implemented as dynamic arrays and sets are implemented as a hash tables.

The implementation of these data structures gives them radically different characteristics. For instance, a hash table has a very fast lookup time but cannot preserve the order of insertion.

Sets are implemented using hash tables. Whenever you add an object to a set, the position within the memory of the `set`

object is determined using the hash of the object to be added. When testing for membership, all that needs to be done is basically to look if the object is at the position determined by its hash, so the speed of this operation does not depend on the size of the set. For lists, in contrast, the whole list needs to be searched, which will become slower as the list grows.

This is also the reason that sets do not preserve the order of the objects you add.

Note that sets aren’t faster than lists in general — membership test is faster for sets, and so is removing an element. As long as you don’t need these operations, lists are often faster.

A list must be searched one by one, where a set or dictionary has an index for faster searching.

** list**: Imagine you are looking for your socks in your closet, but you don’t know in which drawer your socks are, so you have to search drawer by drawer until you find them (or maybe you never do). That’s what we call

`O(n)`

, because in the worst scenario, you will look in all your drawers (where `n`

is the number of drawers).** set**: Now, imagine you’re still looking for your socks in your closet, but now you know in which drawer your socks are, say in the 3rd drawer. So, you will just search in the 3rd drawer, instead of searching in all drawers. That’s what we call

`O(1)`

, because in the worst scenario you will look in just one drawer.While I have not measured anything performance related in python so far, I’d still like to point out that lists are often faster.

Yes, you have O(1) vs. O(n). But always remember that this gives information only about the asymptotic behavior of something. That means if your n is very high O(1) will always be faster – theoretically. In practice however n often needs to be much bigger than your usual data set will be.

So sets are not faster than lists per se, but only if you have to handle a lot of elements.

Basically, Depends on the operation you are doing …

*For adding an element – then a set doesn’t need to move any data, and all it needs to do is calculate a hash value and add it to a table. For a list insertion then potentially there is data to be moved.

*For deleting an element – all a set needs to do is remove the hash entry from the hash table, for a list it potentially needs to move data around (on average 1/2 of the data.

*For a search (i.e. an in operator) – a set just needs to calculate the hash value of the data item, find that hash value in the hash table, and if it is there – then bingo. For a list, the search has to look up each item in turn – on average 1/2 of all of the terms in the list. Even for many 1000s of items a set will be far quicker to search.

Actually sets are not faster than lists in every scenario. Generally the lists are faster than sets. But in the case of searching for an element in a collection, sets are faster because sets have been implemented using hash tables. So basically Python does not have to search the full set, which means that the time complexity in average is O(1). Lists use dynamic arrays and Python needs to check the full array to search. So it takes O(n).

So finally we can see that sets are better in some case and lists are better in some cases. Its up to us to select the appropriate data structure according to our task.