Python: Speed up matching in multiple lists

Question:

I have four lists like these:

L = [ (1,2), (3,5), (6,10), (7,8) ]
M = [ (1,3), (8,9), (12,13) ]
N = [ (6,10), (3,4), (5,6), (10,11), (12,13) ]
T = [ (6,10) , (1,4) ]

I want to check the presence/absence of every tuple of T inside L, M, and N:

[[True, False, True], [False, False, False]]

The following works but it is incredibly inefficient when the size of T, L, M, and N grows.

[[ y in x for x in [L, M, N] ] for y in T ]

What is the most efficient way to speed up this for large lists?

Asked By: dcirillo

||

Answers:

Searching in list time is proportional to the list length. So it’s high for long lists. There are special data structures optimized for searching. The easiest in python is set. It calculates the hash of each element (so the elements must be hashable, and tuples of integers are OK).
Then you do the same check. So you just need to add

L = set(L)
M = set(M)
N = set(N)

As a side effect you will lose the order of elements in the list. And if there are non unique values they will be merged into one.

Moreover sets have method for finding common elements. L & M will give you elements which exists in L and in M. That would be the fastest solution.

About speed of search an item in list and set:
Speed of search in list might be high if the searched values are at the very beginning (or the list is short). But if it is not the case set is much-much faster as time of search is proportional to log(len(data)). The worst case for list is when there are no searched items in the list, so it will need to check every item. In this case searching in 1M list would be 200K slower, than in set (just checked in python3)

Answered By: Leonid Mednikov

You could also look into using Numpy arrays instead of normal python lists and tuples.
look here
Since you have to check every element in your list the total speed will always scale linear so you have to use a faster implementation like the numpy one or extend your code with a faster language like Rust, C, C++.

Use the np.asarray(listname) function to convert it

Answered By: Tyfingr

Use a dictionary and compare those values instead.

LMNT = {'L':[(1,2),(3,5),(6,10),(7,8)],
'M':[(1,3),(8,9),(12,13)],
'N':[ (6,10), (3,4), (5,6), (10,11), (12,13) ],
'T':[ (6,10) , (1,4) ]}

Then you can compare the dictionary.
LMNT['M'][0][1] values is 2

OR

LMNT['N'][4] values is (12,13)

Answered By: Swift

If you can handle a different output format, you can also you set intersection.

>>> L = set([ (1,2), (3,5), (6,10), (7,8) ])
>>> M = set([ (1,3), (8,9), (12,13) ])
>>> N = set([ (6,10), (3,4), (5,6), (10,11), (12,13) ])
>>> T = set([ (6,10) , (1,4) ])
>>> [T & x for x in (L,M,N)]
[{(6, 10)}, set(), {(6, 10)}]

This will give you a list of sets of tuples which appear in both sets each. This should be faster than using nested loops.

Answered By: C. Yduqoli