Split a list into parts based on a set of indexes in Python
Question:
What is the best way to split a list into parts based on an arbitrary number of indexes? E.g. given the code below
indexes = [5, 12, 17]
list = range(20)
return something like this
part1 = list[:5]
part2 = list[5:12]
part3 = list[12:17]
part4 = list[17:]
If there are no indexes it should return the entire list.
Answers:
I would be interested in seeing a more Pythonic way of doing this also. But this is a crappy solution. You need to add a checking for an empty index list.
Something along the lines of:
indexes = [5, 12, 17]
list = range(20)
output = []
prev = 0
for index in indexes:
output.append(list[prev:index])
prev = index
output.append(list[indexes[-1]:])
print output
produces
[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16], [17, 18, 19]]
This is all that I could think of
def partition(list_, indexes):
if indexes[0] != 0:
indexes = [0] + indexes
if indexes[-1] != len(list_):
indexes = indexes + [len(list_)]
return [ list_[a:b] for (a,b) in zip(indexes[:-1], indexes[1:])]
My solution is similar to Il-Bhima’s.
>>> def parts(list_, indices):
... indices = [0]+indices+[len(list_)]
... return [list_[v:indices[k+1]] for k, v in enumerate(indices[:-1])]
Alternative approach
If you’re willing to slightly change the way you input indices, from absolute indices to relative (that is, from [5, 12, 17]
to [5, 7, 5]
, the below will also give you the desired output, while it doesn’t create intermediary lists.
>>> from itertools import islice
>>> def parts(list_, indices):
... i = iter(list_)
... return [list(islice(i, n)) for n in chain(indices, [None])]
The plural of index is indices. Going for simplicity/readability.
indices = [5, 12, 17]
input = range(20)
output = []
for i in reversed(indices):
output.append(input[i:])
input[i:] = []
output.append(input)
while len(output):
print output.pop()
Cide’s makes three copies of the array: [0]+indices copies, ([0]+indices)+[] copies again, and indices[:-1] will copy a third time. Il-Bhima makes five copies. (I’m not counting the return value, of course.)
Those could be reduced (izip, islice), but here’s a zero-copy version:
def iterate_pairs(lst, indexes):
prev = 0
for i in indexes:
yield prev, i
prev = i
yield prev, len(lst)
def partition(lst, indexes):
for first, last in iterate_pairs(lst, indexes):
yield lst[first:last]
indexes = [5, 12, 17]
lst = range(20)
print [l for l in partition(lst, indexes)]
Of course, array copies are fairly cheap (native code) compared to interpreted Python, but this has another advantage: it’s easy to reuse, to mutate the data directly:
for first, last in iterate_pairs(lst, indexes):
for i in range(first, last):
lst[i] = first
print lst
# [0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 5, 5, 12, 12, 12, 12, 12, 17, 17, 17]
(That’s why I passed indexes to iterate_pairs. If you don’t care about that, you can remove that parameter and just have the final line be “yield prev, None”, which is all partition() needs.)
indices = [5, 12, 17]
input = range(20)
output = []
reduce(lambda x, y: output.append(input[x:y]) or y, indices + [len(input)], 0)
print output
This is the simplest and most pythonic solution I can think of:
def partition(alist, indices):
return [alist[i:j] for i, j in zip([0]+indices, indices+[None])]
if the inputs are very large, then the iterators solution should be more convenient:
from itertools import izip, chain
def partition(alist, indices):
pairs = izip(chain([0], indices), chain(indices, [None]))
return (alist[i:j] for i, j in pairs)
and of course, the very, very lazy guy solution (if you don’t mind to get arrays instead of lists, but anyway you can always revert them to lists):
import numpy
partition = numpy.split
Here’s yet another answer.
def partition(l, indexes):
result, indexes = [], indexes+[len(l)]
reduce(lambda x, y: result.append(l[x:y]) or y, indexes, 0)
return result
It supports negative indexes and such.
>>> partition([1,2,3,4,5], [1, -1])
[[1], [2, 3, 4], [5]]
>>>
>>> def burst_seq(seq, indices):
... startpos = 0
... for index in indices:
... yield seq[startpos:index]
... startpos = index
... yield seq[startpos:]
...
>>> list(burst_seq(range(20), [5, 12, 17]))
[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16], [17, 18, 19]]
>>> list(burst_seq(range(20), []))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]
>>> list(burst_seq(range(0), [5, 12, 17]))
[[], [], [], []]
>>>
Maxima mea culpa: it uses a for
statement, and it’s not using whizzbang stuff like itertools, zip(), None as a sentinel, list comprehensions, …
😉
What is the best way to split a list into parts based on an arbitrary number of indexes? E.g. given the code below
indexes = [5, 12, 17]
list = range(20)
return something like this
part1 = list[:5]
part2 = list[5:12]
part3 = list[12:17]
part4 = list[17:]
If there are no indexes it should return the entire list.
I would be interested in seeing a more Pythonic way of doing this also. But this is a crappy solution. You need to add a checking for an empty index list.
Something along the lines of:
indexes = [5, 12, 17]
list = range(20)
output = []
prev = 0
for index in indexes:
output.append(list[prev:index])
prev = index
output.append(list[indexes[-1]:])
print output
produces
[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16], [17, 18, 19]]
This is all that I could think of
def partition(list_, indexes):
if indexes[0] != 0:
indexes = [0] + indexes
if indexes[-1] != len(list_):
indexes = indexes + [len(list_)]
return [ list_[a:b] for (a,b) in zip(indexes[:-1], indexes[1:])]
My solution is similar to Il-Bhima’s.
>>> def parts(list_, indices):
... indices = [0]+indices+[len(list_)]
... return [list_[v:indices[k+1]] for k, v in enumerate(indices[:-1])]
Alternative approach
If you’re willing to slightly change the way you input indices, from absolute indices to relative (that is, from [5, 12, 17]
to [5, 7, 5]
, the below will also give you the desired output, while it doesn’t create intermediary lists.
>>> from itertools import islice
>>> def parts(list_, indices):
... i = iter(list_)
... return [list(islice(i, n)) for n in chain(indices, [None])]
The plural of index is indices. Going for simplicity/readability.
indices = [5, 12, 17]
input = range(20)
output = []
for i in reversed(indices):
output.append(input[i:])
input[i:] = []
output.append(input)
while len(output):
print output.pop()
Cide’s makes three copies of the array: [0]+indices copies, ([0]+indices)+[] copies again, and indices[:-1] will copy a third time. Il-Bhima makes five copies. (I’m not counting the return value, of course.)
Those could be reduced (izip, islice), but here’s a zero-copy version:
def iterate_pairs(lst, indexes):
prev = 0
for i in indexes:
yield prev, i
prev = i
yield prev, len(lst)
def partition(lst, indexes):
for first, last in iterate_pairs(lst, indexes):
yield lst[first:last]
indexes = [5, 12, 17]
lst = range(20)
print [l for l in partition(lst, indexes)]
Of course, array copies are fairly cheap (native code) compared to interpreted Python, but this has another advantage: it’s easy to reuse, to mutate the data directly:
for first, last in iterate_pairs(lst, indexes):
for i in range(first, last):
lst[i] = first
print lst
# [0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 5, 5, 12, 12, 12, 12, 12, 17, 17, 17]
(That’s why I passed indexes to iterate_pairs. If you don’t care about that, you can remove that parameter and just have the final line be “yield prev, None”, which is all partition() needs.)
indices = [5, 12, 17]
input = range(20)
output = []
reduce(lambda x, y: output.append(input[x:y]) or y, indices + [len(input)], 0)
print output
This is the simplest and most pythonic solution I can think of:
def partition(alist, indices):
return [alist[i:j] for i, j in zip([0]+indices, indices+[None])]
if the inputs are very large, then the iterators solution should be more convenient:
from itertools import izip, chain
def partition(alist, indices):
pairs = izip(chain([0], indices), chain(indices, [None]))
return (alist[i:j] for i, j in pairs)
and of course, the very, very lazy guy solution (if you don’t mind to get arrays instead of lists, but anyway you can always revert them to lists):
import numpy
partition = numpy.split
Here’s yet another answer.
def partition(l, indexes):
result, indexes = [], indexes+[len(l)]
reduce(lambda x, y: result.append(l[x:y]) or y, indexes, 0)
return result
It supports negative indexes and such.
>>> partition([1,2,3,4,5], [1, -1])
[[1], [2, 3, 4], [5]]
>>>
>>> def burst_seq(seq, indices):
... startpos = 0
... for index in indices:
... yield seq[startpos:index]
... startpos = index
... yield seq[startpos:]
...
>>> list(burst_seq(range(20), [5, 12, 17]))
[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16], [17, 18, 19]]
>>> list(burst_seq(range(20), []))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]
>>> list(burst_seq(range(0), [5, 12, 17]))
[[], [], [], []]
>>>
Maxima mea culpa: it uses a for
statement, and it’s not using whizzbang stuff like itertools, zip(), None as a sentinel, list comprehensions, …
😉