Python: iterate over a sublist

Question:

Generally, when you want to iterate over a portion of a list in Python, the easiest thing to do is just slice the list.

# Iterate over everything except the first item in a list
#
items = [1,2,3,4]
iterrange = (x for x in items[1:])

But the slice operator creates a new list, which is not even necessary to do in many cases. Ideally, I’d like some kind of slicing function that creates generators, as opposed to new list objects. Something similar to this could be accomplished by creating a generator expression that uses a range to return only certain portions of the list:

# Create a generator expression that returns everything except 
# the first item in the list
#
iterrange = (x for x, idx in zip(items, range(0, len(items))) if idx != 0)

But this is sort of cumbersome. I’m wondering if there is a better, more elegant way to do this. So, what’s the easiest way to slice a list so that a generator expression is created instead of a new list object?

Asked By: Channel72

||

Answers:

Use itertools.islice:

import itertools

l = range(20)

for i in itertools.islice(l,10,15):
    print i

10
11
12
13
14

From the doc:

Make an iterator that returns selected elements from the iterable

Answered By: Sebastian Hoffmann

Try itertools.islice:

http://docs.python.org/library/itertools.html#itertools.islice

iterrange = itertools.islice(items, 1, None)
Answered By: Steven

Before I start, to be clear, the correct order of selecting between slicing approaches is usually:

  1. Use regular slicing (the cost of copying all but the longest of inputs is usually not meaningful, and the code is much simpler). If the input might not be a sliceable sequence type, convert it to one, then slice, e.g. allbutone = list(someiterable)[1:]. This is simpler, and for most cases, typically faster, than any other approach.
  2. If regular slicing isn’t viable (the input isn’t guaranteed to be a sequence and converting to a sequence before slicing might cause memory issues, or it’s huge and the slice covers most of it, e.g. skipping the first 1000 and last 1000 elements of a 10M element list, so memory might be a concern), itertools.islice is usually the correct solution as it’s simple enough, and the performance cost is usually unimportant.
  3. If, and only if, islice‘s performance is unacceptably slow (it adds some overhead to producing every item, though admittedly it’s quite a small amount) and the amount of data to be skipped is small, while the data to be included is huge (e.g. the OP’s scenario of skipping a single element and keeping the rest), keep reading

If you find yourself in case #3, you’re in a scenario where islice‘s ability to bypass initial elements (relatively) quickly isn’t enough to make up for the incremental cost to produce the rest of the elements. In that case, you can improve performance by reversing your problem from selecting all elements after n to discarding all elements before n.

For this approach, you manually convert your input to an iterator, then explicitly pull out and discard n values, then iterate what’s left in the iterator (but without the per-element overhead of islice). For example, for an input of myinput = list(range(1, 10000)), your options for selecting elements 1 through the end are:

# Approach 1, OP's approach, simple slice:
for x in myinput[1:]:

# Approach 2, Sebastian's approach, using itertools.islice:
for x in islice(myinput, 1, None):

# Approach 3 (my approach)
myiter = iter(myinput)  # Explicitly create iterator from input (looping does this already)
next(myiter, None) # Throw away one element, providing None default to avoid StopIteration error
for x in myiter:  # Iterate unwrapped iterator

If the number of elements to discard is larger, it’s probably best to borrow the consume recipe from the itertools docs:

def consume(iterator, n=None):
    "Advance the iterator n-steps ahead. If n is None, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

which makes the approaches generalize for skipping n elements to:

# Approach 1, OP's approach, simple slice:
for x in myinput[n:]:

# Approach 2, Sebastian's approach, using itertools.islice:
for x in islice(myinput, n, None):

# Approach 3 (my approach)
myiter = iter(myinput)  # Explicitly create iterator from input (looping does this already)
consume(myiter, n)      # Throw away n elements
# Or inlined consume as next(islice(myiter, n, n), None)
for x in myiter:        # Iterate unwrapped iterator

Performance-wise, this wins by a meaningful amount for most large inputs (exception: range itself on Python 3 is already optimized for plain slicing; plain slicing can’t be beat on actual range objects). ipython3 microbenchmarks (on CPython 3.6, 64 bit Linux build) illustrate this (the definition of slurp in the setup is just a way to make the lowest overhead approach to running out an iterable so we minimize the impact of the stuff we’re not interested in):

>>> from itertools import islice
>>> from collections import deque
>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(10000))
... slurp(r[1:])
...
65.8 μs ± 109 ns per loop (mean ± std. dev. of 5 runs, 10000 loops each)

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(10000))
... slurp(islice(r, 1, None))
...
70.7 μs ± 104 ns per loop (mean ± std. dev. of 5 runs, 10000 loops each)

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(10000))
... ir = iter(r)
... next(islice(ir, 1, 1), None)  # Inlined consume for simplicity, but with islice wrapping to show generalized usage
... slurp(ir)
...
30.3 μs ± 64.1 ns per loop (mean ± std. dev. of 5 runs, 10000 loops each)

Obviously, the extra complexity of my solution isn’t usually going to be worth it, but for moderate sized inputs (10K elements in this case), the performance benefit is clear; islice was the worst performer (by a small amount), plain slicing was slightly better (which reinforces my point about plain slicing almost always being the best solution when you have an actual sequence), and the "convert to iterator, discard initial, use rest" approach won by a huge amount, relatively speaking (well under half the time of either of the under solutions).

That benefit won’t show up for tiny inputs, because the fixed overhead of loading/calling iter/next, and especially islice, will outweigh the savings:

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(10))
... slurp(r[1:])
...
207 ns ± 1.86 ns per loop (mean ± std. dev. of 5 runs, 1000000 loops each)

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(10))
... slurp(islice(r, 1, None))
...
307 ns ± 1.71 ns per loop (mean ± std. dev. of 5 runs, 1000000 loops each)

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(10))
... ir = iter(r)
... next(islice(ir, 1, 1), None)  # Inlined consume for simplicity, but with islice wrapping to show generalized usage
... slurp(ir)
...
518 ns ± 4.5 ns per loop (mean ± std. dev. of 5 runs, 1000000 loops each)

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(10))
... ir = iter(r)
... next(ir, None)  # To show fixed overhead of islice, use next without it
... slurp(ir)
...
341 ns ± 0.947 ns per loop (mean ± std. dev. of 5 runs, 1000000 loops each)

but as you can see, even for 10 elements the islice-free approach isn’t much worse; by 100 elements, the islice-free approach is faster than all competitors, and by 200 elements, the generalized next+islice beats all competitors (obviously it doesn’t beat islice-free given the 180 ns overhead of islice, but that’s made up for by generalizing to skipping n elements as a single step, rather than needing to call next repeatedly for skipping more than one element). Plain islice rarely wins in the "skip a few, keep a lot" case due to the per element overhead the wrapper exacts (it didn’t clearly beat eager slicing in the microbenchmarks until around 100K elements; it’s memory efficient, but CPU inefficient), and it will do even worse (relative to eager slicing) in the "skip a lot, keep a few" case.


Special case hackery for specific built-in sequences when performance is critical

Special case for most built-in sequences with O(1) indexing (list, tuple, str, etc., excluding collections.deque)

Burying this at the bottom, because while it’s absolutely the fastest solution, it’s also type-specific (won’t work on arbitrary iterables) and it relies on implementation-details (specifically, the implementation of the pickling functionality for Python built-in sequences; this is unlikely to change, since it would break existing pickled data if support were removed, but it’s not a language guarantee). If you’re in a scenario where:

  1. The input is a list (or other built-in flat sequence types with O(1) indexing like tuple or str, but not collections.deque, which is O(n) indexing)
  2. The number of items to be skipped is huge
  3. The number of items to be selected is also huge (you don’t even want to pay the memory cost for the pointers a shallow copying slice would incur)

You can do a horrible, terrible thing by directly manipulating the iterator to skip the items with O(1) cost (where using the consume recipe, inlined or not, is O(n) in items skipped. It’s essentially the same as Approach #3 above, except we abuse the design of sequence iterators to skip ahead to the index we care about:

# Approach 4 (my hacky, but performant, approach)
myiter = iter(myinput)  # Explicitly create iterator from input like before
myiter.__setstate__(n)  # Set n as the next index to iterate
for x in myiter:        # Iterate updated iterator

Timings comparing the best solution from before (using inlined-consume) for a larger input, to plain slicing (with associated memory cost and eager operation), to manually changing iterator position, using CPython 3.11.1 64 bit Linux build:

>>> from itertools import islice
>>> from collections import deque
>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(100_000_000))  # *Much* bigger input
... ir = iter(r)
... next(islice(ir, 90_000_000, 90_000_000), None)  # *Much* bigger skip
... slurp(ir)                                       # *Much* larger amount to consume
...
339 ms ± 3.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(100_000_000))
... slurp(r[90_000_000:])
...
104 ms ± 648 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(100_000_000))
... ir = iter(r)
... ir.__setstate__(90_000_000)
... slurp(ir)
...
32.7 ms ± 278 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

For this "skip 90M, take 10M" scenario, plain slicing is takes about ⅓ the time of the optimized inline consume, and manual iterator manipulation in turn takes ⅓ the time of plain slicing (because plain slicing effectively has to do 3x the iteration work, once to copy from input to sliced copy, once to iterate it, and once to decrement references when the slice is released). If you did not wish to keep all the items after the skip threshold, slicing is likely the best solution, but you could wrap in islice at that point to pull n items from the pre-advanced iterator.

Special case for collections.deque

For arbitrary iterables, this obviously won’t work (dict [and its views and iterators], set [and its iterator], open file objects, etc.), so inlined consume remains the only real option there. collections.deque is a special case though, as, while it does not support slicing, and its iterator doesn’t support __setstate__, it does support rotation, so you could write a custom wrapper to rotate the elements you want to the front, islice them off, then rotate it back the slicing is complete (relies on not needing to modify the deque during iteration). For example:

def fast_islice_deque(deq, *slc):
    try:
        [stop] = slc  # Check for simple case, just islicing from beginning
    except ValueError:
        pass
    else:
        yield from islice(deq, stop)  # No need for rotation, just pull what we need

    # We need to rotate, which requires some fix-ups to indices first
    start, stop, step = slice(*slc).indices(len(deq))
    stop -= start  # Rotate takes care of start
    deq.rotate(-start)  # Move elements we care about to start with tiny amount of work
    try:
        yield from islice(deq, None, stop, step)
    finally:
        deq.rotate(start)  # Restore original ordering with tiny amount of work

Again, timings from CPython 3.11.1 on 64 bit Linux:

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = deque(range(100_000_000))  # Same huge input, as deque
... ir = iter(r)
... next(islice(ir, 90_000_000, 90_000_000), None)
... slurp(ir)
...
368 ms ± 2.06 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = deque(range(100_000_000))
... slurp(fast_islice_deque(r, 90_000_000, None))
...
245 ms ± 5.34 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Or comparing pulling a smaller number of items after the skip:

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = deque(range(100_000_000))  # Same huge input, as deque
... slurp(islice(r, 90_000_000, 90_001_000))  # Need islice to bound selection anyway, so no pre-consume
...
331 ms ± 4.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = deque(range(100_000_000))
... slurp(fast_islice_deque(r, 90_000_000, 90_001_000))
...
19.4 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

As you can see, using rotate saves a decent amount of work in both cases, and it’s especially helpful when you’re pulling a smallish number of items. When pulling a large and unbounded number, it’s not as helpful solely because the cost of pulling 10M items is significantly higher than skipping the first 90M and you pay the per-item overhead of islice where the inlined consume approach doesn’t need to use it for the items you pull. But when pulling a small/bounded number, both approaches need to pay per-item islice overhead costs, but the rotate-based solution, while technically still O(n), does dramatically less work (it doesn’t touch any reference counts, and just has to fix up block pointers for work a fraction as complicated as isliceing).

Answered By: ShadowRanger

The accepted answer using itertools.islice is not entirely satisfactory: yes it is easy, but islice has to consume the first elements of the list which can be slow if the list is huge and you start from a large index.

My recommendation is to write your own iterator:

def gen_slice(my_list, *slice):
    for i in range(*slice):
        yield my_list[i]

Or, more concisely:

gen_slice_map = lambda my_list, *slice: map(my_list.__getitem__, range(*slice))

See the difference in performance — pay attention to the fact that the first one is in ms while the others are in ns (also, it turns out that the explicit for loop is actually very slightly faster than the map version, though as pointed out by @ShadowRanger in comments this is only because my example below is extracting a single example, while the map version is faster for larger lists):

my_list = list(range(100_000_000))

%timeit list(islice(my_list, 99_000_000, 99_000_001))
400 ms ± 18.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit list(gen_slice(my_list, 99_000_000, 99_000_001))
409 ns ± 8.46 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit list(gen_slice_map(my_list, 99_000_000, 99_000_001))
430 ns ± 6.36 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Answered By: tiho

Motivated by ShadowRanger’s answer with multiple efficient solutions, here’s another. Working in chunks. In my experiments, it was up to 2.5 times faster than large list slices. And with small chunks, memory usage is low.

Here’s how processing a single list slice might look like:

for element in lst[start:]:
    # do something with the element

And how processing it in chunks might look like:

for i in range(start, len(lst), chunksize):
    for element in lst[i : i+chunksize]:
        # do something with the element

As ShadowRanger said, list slices are fast, but they take three passes: for creating, iterating, and discarding the slice. If the slice is large, that’s cache-unfriendly. Roughly speaking: Let’s say you have 1 MB cache and the list slice and its elements are 2 MB large. Then at the end of the first pass, the second half of the list is in cache and the first half isn’t anymore. So the second pass doesn’t benefit from the cache: During its first half, none of it is in cache. And during its second half, none of that is in cache, either, because its first half just replaced everything in the cache. Same with the third pass.

Now instead of creating, iterating and discarding a single large slice, let’s do that in smaller chunks. Then the second and third pass of each chunk can benefit from the chunk’s data still being in cache. That’s what makes it faster.

Here’s an experiment. I created a list with 16 million elements and processed it in chunks of different sizes, from tiny chunks of 16 elements all the way up to a single chunk of all 16 million elements:

chunk size 2^4     19.8 ± 0.6 ns / element
chunk size 2^5     13.2 ± 0.2 ns / element
chunk size 2^6     10.2 ± 0.0 ns / element
chunk size 2^7      8.9 ± 0.1 ns / element
chunk size 2^8      8.3 ± 0.2 ns / element
chunk size 2^9      7.6 ± 0.1 ns / element
chunk size 2^10     7.5 ± 0.0 ns / element
chunk size 2^11     7.4 ± 0.0 ns / element
chunk size 2^12     7.3 ± 0.1 ns / element
chunk size 2^13     7.4 ± 0.0 ns / element
chunk size 2^14     7.8 ± 0.1 ns / element
chunk size 2^15     8.4 ± 0.0 ns / element
chunk size 2^16     8.9 ± 0.1 ns / element
chunk size 2^17     9.5 ± 0.1 ns / element
chunk size 2^18    10.4 ± 0.1 ns / element
chunk size 2^19    11.3 ± 0.3 ns / element
chunk size 2^20    12.0 ± 0.1 ns / element
chunk size 2^21    12.1 ± 0.2 ns / element
chunk size 2^22    13.9 ± 0.1 ns / element
chunk size 2^23    13.8 ± 0.2 ns / element
chunk size 2^24    13.8 ± 0.1 ns / element

We see three things:

  • Small chunks are slow, due to the chunking overhead.
  • Large chunks are slow, as discussed above.
  • Optimal chunk size seems to be around 212 elements.

That was with elements in the list in creation order, so elements adjacent in the list were mostly also adjacent in memory. If we shuffle them, so that adjacent elements are scattered all over the memory, things get slower and change a bit (note I only used 2 million elements here, as it got too slow):

chunk size 2^4     38.3 ± 0.7 ns / element
chunk size 2^5     29.5 ± 0.0 ns / element
chunk size 2^6     24.0 ± 0.4 ns / element
chunk size 2^7     21.0 ± 0.3 ns / element
chunk size 2^8     19.8 ± 0.3 ns / element
chunk size 2^9     19.6 ± 0.2 ns / element
chunk size 2^10    19.5 ± 0.3 ns / element
chunk size 2^11    19.6 ± 0.5 ns / element
chunk size 2^12    21.1 ± 0.5 ns / element
chunk size 2^13    25.4 ± 0.1 ns / element
chunk size 2^14    29.3 ± 0.5 ns / element
chunk size 2^15    33.5 ± 0.4 ns / element
chunk size 2^16    37.3 ± 0.2 ns / element
chunk size 2^17    41.1 ± 0.4 ns / element
chunk size 2^18    46.7 ± 0.2 ns / element
chunk size 2^19    48.1 ± 0.7 ns / element
chunk size 2^20    48.9 ± 0.3 ns / element
chunk size 2^21    49.0 ± 0.1 ns / element

Now the optimal chunk size is around 210 elements, and it was 2.5 times faster than using a single big slice of 2 million elements.

Chunk size 210 elements was good in both cases, so that’s what I’d recommend. Although it depends on cache sizes, so different computers can have different optimal sizes. Also, if your objects are larger or you’re actually doing something with the elements, so you also use cache for that, then a smaller chunk size could be better.

Granted, writing

for i in range(start, len(lst), chunksize):
    for element in lst[i : i+chunksize]:
        # do something with the element

is cumbersome compared to the simpler single list slice. We can write tool functions that help us, so we can write

for chunk in chunks(lst, start):
    for element in chunk:
        # do something with the element

or even:

for element in islice_chunked(lst, start):
    # do something with the element

(Note it doesn’t use itertools.islice, I only called it that because it similarly gives you an iterator over the elements.)

Benchmark for iterating a shuffled list of 10 million elements starting at index 7 million:

179 ms ± 1.9 ms  use_chunks1
188 ms ± 4.6 ms  use_islice_chunked
230 ms ± 7.3 ms  use_chunks2
349 ms ± 2.0 ms  use_one_slice
459 ms ± 4.9 ms  use_islice

The tool functions can be extended to also support stop and step parameters. Left as exercise for the reader (or I might add it later, but the current simple ones suffice to demonstrate the technique and its benefits, and that was my main goal).

Benchmark code (Attempt This Online!):

from itertools import islice, chain
from collections import deque
from timeit import default_timer as time
from random import shuffle
from statistics import mean, stdev

slurp = deque(maxlen=0).extend
lst = list(range(10_000_000))
shuffle(lst)
start = 7_000_000

def chunks(seq, start):
    chunk_size = 2**10
    for start in range(start, len(seq), chunk_size):
        yield seq[start : start+chunk_size]

def islice_chunked(seq, start):
    """Like islice(seq, start, None), but
       using list slice chunks for more speed."""
    return chain.from_iterable(chunks(seq, start))

def use_one_slice(lst, start):
    slurp(lst[start:])

def use_islice(lst, start):
    slurp(islice(lst, start, None))

def use_chunks1(lst, start):
    slurp(map(slurp, chunks(lst, start)))

def use_chunks2(lst, start):
    for chunk in chunks(lst, start):
        slurp(chunk)

def use_islice_chunked(lst, start):
    slurp(islice_chunked(lst, start))

funcs = use_one_slice, use_islice, use_chunks1, use_chunks2, use_islice_chunked

times = {f: [] for f in funcs}
def stats(f):
    ts = [t * 1e3 for t in sorted(times[f])[:3]]
    return f'{round(mean(ts))} ms ± {stdev(ts):3.1f} ms '
for _ in range(10):
    for f in funcs:
        t = time()
        f(lst, start)
        times[f].append(time() - t)
for f in sorted(funcs, key=stats):
    print(stats(f), f.__name__)

Code for the initial experiments (Attempt This Online!):

from collections import deque
from timeit import default_timer as time
from statistics import mean, stdev
from random import shuffle

shuffled = False

E = 21 if shuffled else 24
es = range(4, E+1)
n = 2 ** E
lst = list(range(n))
if shuffled:
    shuffle(lst)
slurp = deque(maxlen=0).extend

def run(lst, chunksize):
    for start in range(0, n, chunksize):
        slurp(lst[start : start+chunksize])

times = {e: [] for e in es}
def stats(f):
    ts = [t / n * 1e9 for t in sorted(times[f])[:3]]
    return f'{mean(ts):6.1f} ± {stdev(ts):3.1f} ns'
for _ in range(20 if shuffled else 10):
    for e in es:
        t = time()
        run(lst, 2 ** e)
        times[e].append(time() - t)
for e in es:
    print(f'chunk size 2^{e:<3}', stats(e), '/ element')
Answered By: Kelly Bundy
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.