Filtering / iterating through very large lists in python

Question:

If I have a list with say 10 million objects, how do I filter the list quickly. It takes about 4-5 seconds for a complete iteration thru a list comprehension. Are there any efficient data structures or libraries for this in python? Or is python not suited for large sets of data?

Asked By: abc def foo bar

||

Answers:

Itertools is designed for efficient looping. Particularly, you might find that ifilter suits your purpose. Iterating through large data structures is always expensive, but if you only need some of the data at a time lazy evaluation can help a lot.

You can also try using generator expressions, which are usually identical to their list comprehension counterparts (though usage can be different) or a generator, which also have the benefits of lazy evaluation.

Answered By: Rafe Kettler

If you have uniform types of numbers & if speed is your primary goal (and you want to use python), use a Numpy array.

Answered By: Gerrat

Even using the builtin functions on a very primitive integer array takes several seconds to evaluate on my computer.

>>> l=[1]*10000000
>>> s=filter(lambda x:True,l)

I’d suggest you using a different approach such as using Numpy or lazy evaluation with generators and/or using iteration module itertools

Answered By: Utku Zihnioglu
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.