# Test if lists share any items in python

## Question:

I want to check if any of the items in one list are present in another list. I can do it simply with the code below, but I suspect there might be a library function to do this. If not, is there a more pythonic method of achieving the same result.

``````In [78]: a = [1, 2, 3, 4, 5]

In [79]: b = [8, 7, 6]

In [80]: c = [8, 7, 6, 5]

In [81]: def lists_overlap(a, b):
....:     for i in a:
....:         if i in b:
....:             return True
....:     return False
....:

In [82]: lists_overlap(a, b)
Out[82]: False

In [83]: lists_overlap(a, c)
Out[83]: True

In [84]: def lists_overlap2(a, b):
....:     return len(set(a).intersection(set(b))) > 0
....:
``````

``````def lists_overlap3(a, b):
return bool(set(a) & set(b))
``````

Note: the above assumes that you want a boolean as the answer. If all you need is an expression to use in an `if` statement, just use `if set(a) & set(b):`

You could also use `any` with list comprehension:

``````any([item in a for item in b])
``````

You can use the any built in function /w a generator expression:

``````def list_overlap(a,b):
return any(i for i in a if i in b)
``````

As John and Lie have pointed out this gives incorrect results when for every i shared by the two lists bool(i) == False. It should be:

``````return any(i in b for i in a)
``````
``````def lists_overlap(a, b):
sb = set(b)
return any(el in sb for el in a)
``````

This is asymptotically optimal (worst case O(n + m)), and might be better than the intersection approach due to `any`‘s short-circuiting.

E.g.:

``````lists_overlap([3,4,5], [1,2,3])
``````

will return True as soon as it gets to `3 in sb`

EDIT: Another variation (with thanks to Dave Kirby):

``````def lists_overlap(a, b):
sb = set(b)
return any(itertools.imap(sb.__contains__, a))
``````

This relies on `imap`‘s iterator, which is implemented in C, rather than a generator comprehension. It also uses `sb.__contains__` as the mapping function. I don’t know how much performance difference this makes. It will still short-circuit.

In python 2.6 or later you can do:

``````return not frozenset(a).isdisjoint(frozenset(b))
``````

Short answer: use `not set(a).isdisjoint(b)`, it’s generally the fastest.

There are four common ways to test if two lists `a` and `b` share any items. The first option is to convert both to sets and check their intersection, as such:

``````bool(set(a) & set(b))
``````

Because sets are stored using a hash table in Python, searching them is `O(1)` (see here for more information about complexity of operators in Python). Theoretically, this is `O(n+m)` on average for `n` and `m` objects in lists `a` and `b`. But

1. it must first create sets out of the lists, which can take a non-negligible amount of time, and
2. it supposes that hashing collisions are sparse among your data.

The second way to do it is using a generator expression performing iteration on the lists, such as:

``````any(i in a for i in b)
``````

This allows to search in-place, so no new memory is allocated for intermediary variables. It also bails out on the first find. But the `in` operator is always `O(n)` on lists (see here).

Another proposed option is an hybridto iterate through one of the list, convert the other one in a set and test for membership on this set, like so:

``````a = set(a); any(i in a for i in b)
``````

A fourth approach is to take advantage of the `isdisjoint()` method of the (frozen)sets (see here), for example:

``````not set(a).isdisjoint(b)
``````

If the elements you search are near the beginning of an array (e.g. it is sorted), the generator expression is favored, as the sets intersection method have to allocate new memory for the intermediary variables:

``````from timeit import timeit
>>> timeit('bool(set(a) & set(b))', setup="a=list(range(1000));b=list(range(1000))", number=100000)
26.077727576019242
>>> timeit('any(i in a for i in b)', setup="a=list(range(1000));b=list(range(1000))", number=100000)
0.16220548999262974
``````

Here’s a graph of the execution time for this example in function of list size:

Note that both axes are logarithmic. This represents the best case for the generator expression. As can be seen, the `isdisjoint()` method is better for very small list sizes, whereas the generator expression is better for larger list sizes.

On the other hand, as the search begins with the beginning for the hybrid and generator expression, if the shared element are systematically at the end of the array (or both lists does not share any values), the disjoint and set intersection approaches are then way faster than the generator expression and the hybrid approach.

``````>>> timeit('any(i in a for i in b)', setup="a=list(range(1000));b=[x+998 for x in range(999,0,-1)]", number=1000))
13.739536046981812
>>> timeit('bool(set(a) & set(b))', setup="a=list(range(1000));b=[x+998 for x in range(999,0,-1)]", number=1000))
0.08102107048034668
``````

It is interesting to note that the generator expression is way slower for bigger list sizes. This is only for 1000 repetitions, instead of the 100000 for the previous figure. This setup also approximates well when when no elements are shared, and is the best case for the disjoint and set intersection approaches.

Here are two analysis using random numbers (instead of rigging the setup to favor one technique or another):

High chance of sharing: elements are randomly taken from `[1, 2*len(a)]`. Low chance of sharing: elements are randomly taken from `[1, 1000*len(a)]`.

Up to now, this analysis supposed both lists are of the same size. In case of two lists of different sizes, for example `a` is much smaller, `isdisjoint()` is always faster:

Make sure that the `a` list is the smaller, otherwise the performance decreases. In this experiment, the `a` list size was set constant to `5`.

In summary:

• If the lists are very small (< 10 elements), `not set(a).isdisjoint(b)` is always the fastest.
• If the elements in the lists are sorted or have a regular structure that you can take advantage of, the generator expression `any(i in a for i in b)` is the fastest on large list sizes;
• Test the set intersection with `not set(a).isdisjoint(b)`, which is always faster than `bool(set(a) & set(b))`.
• The hybrid "iterate through list, test on set" `a = set(a); any(i in a for i in b)` is generally slower than other methods.
• The generator expression and the hybrid are much slower than the two other approaches when it comes to lists without sharing elements.

In most cases, using the `isdisjoint()` method is the best approach as the generator expression will take much longer to execute, as it is very inefficient when no elements are shared.

This question is pretty old, but I noticed that while people were arguing sets vs. lists, that no one thought of using them together. Following Soravux’s example,

Worst case for lists:

``````>>> timeit('bool(set(a) & set(b))',  setup="a=list(range(10000)); b=[x+9999 for x in range(10000)]", number=100000)
100.91506409645081
>>> timeit('any(i in a for i in b)', setup="a=list(range(10000)); b=[x+9999 for x in range(10000)]", number=100000)
19.746716022491455
>>> timeit('any(i in a for i in b)', setup="a= set(range(10000)); b=[x+9999 for x in range(10000)]", number=100000)
0.092626094818115234
``````

And the best case for lists:

``````>>> timeit('bool(set(a) & set(b))',  setup="a=list(range(10000)); b=list(range(10000))", number=100000)
154.69790101051331
>>> timeit('any(i in a for i in b)', setup="a=list(range(10000)); b=list(range(10000))", number=100000)
0.082653045654296875
>>> timeit('any(i in a for i in b)', setup="a= set(range(10000)); b=list(range(10000))", number=100000)
0.08434605598449707
``````

So even faster than iterating through two lists is iterating though a list to see if it’s in a set, which makes sense since checking if a number is in a set takes constant time while checking by iterating through a list takes time proportional to the length of the list.

Thus, my conclusion is that iterate through a list, and check if it’s in a set.

if you don’t care what the overlapping element might be, you can simply check the `len` of the combined list vs. the lists combined as a set. If there are overlapping elements, the set will be shorter:

`len(set(a+b+c))==len(a+b+c)` returns True, if there is no overlap.

I’ll throw another one in with a functional programming style:

``````any(map(lambda x: x in a, b))
``````

Explanation:

``````map(lambda x: x in a, b)
``````

returns a list of booleans where elements of `b` are found in `a`. That list is then passed to `any`, which simply returns `True` if any elements are `True`.

Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.