Python, converting a list of indices to slices

Question:

So I have a list of indices,

[0, 1, 2, 3, 5, 7, 8, 10]

and want to convert it to this,

[[0, 3], [5], [7, 8], [10]]

this will run on a large number of indices.

Also, this technically isn’t for slices in python, the tool I am working with is faster when given a range compared to when given the individual ids.

The pattern is based on being in a range, like slices work in python. So in the example, the 1 and 2 are dropped because they are already included in the range of 0 to 3. The 5 would need accessed individually since it is not in a range, etc. This is more helpful when a large number of ids get included in a range such as [0, 5000].

Asked By: Saebin

||

Answers:

Since you want the code to be fast, I wouldn’t try to be too fancy. A straight-forward approach should perform quite well:

a = [0, 1, 2, 3, 5, 7, 8, 10]
it = iter(a)
start = next(it)
slices = []
for i, x in enumerate(it):
    if x - a[i] != 1:
        end = a[i]
        if start == end:
            slices.append([start])
        else:
            slices.append([start, end])
        start = x
if a[-1] == start:
    slices.append([start])
else:
    slices.append([start, a[-1]])

Admittedly, that’s doesn’t look too nice, but I expect the nicer solutions I can think of to perform worse. (I did not do a benchmark.)

Here is s slightly nicer, but slower solution:

from itertools import groupby
a = [0, 1, 2, 3, 5, 7, 8, 10]
slices = []
for key, it in groupby(enumerate(a), lambda x: x[1] - x[0]):
    indices = [y for x, y in it]
    if len(indices) == 1:
        slices.append([indices[0]])
    else:
        slices.append([indices[0], indices[-1]])
Answered By: Sven Marnach
def runs(seq):
    previous = None
    start = None
    for value in itertools.chain(seq, [None]):
        if start is None:
            start = value
        if previous is not None and value != previous + 1:
            if start == previous:
                yield [previous]
            else:
                yield [start, previous]
            start = value
        previous = value
Answered By: Mark Ransom

Since performance is an issue go with the first solution by @SvenMarnach but here is a fun one liner split into two lines! 😀

>>> from itertools import groupby, count
>>> indices = [0, 1, 2, 3, 5, 7, 8, 10]
>>> [[next(v)] + list(v)[-1:]
     for k,v in groupby(indices, lambda x,c=count(): x-next(c))]
[[0, 3], [5], [7, 8], [10]]
Answered By: jamylak

Below is a simple python code with numpy:

def list_to_slices(inputlist):
      """
      Convert a flatten list to a list of slices:
      test = [0,2,3,4,5,6,12,99,100,101,102,13,14,18,19,20,25]
      list_to_slices(test)
      -> [(0, 0), (2, 6), (12, 14), (18, 20), (25, 25), (99, 102)]
      """
      inputlist.sort()
      pointers = numpy.where(numpy.diff(inputlist) > 1)[0]
      pointers = zip(numpy.r_[0, pointers+1], numpy.r_[pointers, len(inputlist)-1])
      slices = [(inputlist[i], inputlist[j]) for i, j in pointers]
      return slices
Answered By: bougui

If your input is a sorted sequence, which I assume it is, you can do it in a minimalistic way in three steps by employing the old good zip() function:

x = [0, 1, 2, 3, 5, 7, 8, 10]
# find beginnings and endings of sequential runs,
# N.B. the first beginning and the last ending are not included
begs_ends_iter = zip(
    *[(x1, x0) for x0, x1 in zip(x[:-1], x[1:]) if x1 - x0 > 1]
)
# handling case when there is only one sequential run
begs, ends = tuple(begs_ends_iter) or ((), ())
# add the first beginning and the last ending,
# combine corresponding beginnings and endings,
# and convert isolated elements into the lists of length one
y = [
    [beg] if beg == end else [beg, end]
    for beg, end in zip(tuple(x[:1]) + begs, ends + tuple(x[-1:]))
]

If your input is unsorted then sort it and you will get sorted list, which is a sequence. If you have a sorted iterable and do not want to convert it to a sequence (e.g., because it is too long) then you may make use of chain() and pairwise() functions from itertools package (pairwise() is available since Python 3.10):

from itertools import chain, pairwise
x = [0, 1, 2, 3, 5, 7, 8, 10]
# find beginnings and endings of sequential runs,
# N.B. the last beginning and the first ending are None's
begs, ends = zip(
    *[
        (x1, x0)
        for x0, x1 in pairwise(chain([None], x, [None]))
        if x0 is None or x1 is None or x1 - x0 > 1
     ]
)
# removing the last beginning and the first ending,
# combine corresponding beginnings and endings,
# and convert isolated elements into the lists of length one
y = [
    [beg] if beg == end else [beg, end] 
    for beg, end in zip(begs[:-1], ends[1:])
]

These solutions are similar to the one proposed by bougui, but without using numpy. Which may be more efficient if data is not in numpy array already and is not very large sequence or opposite, too large iterable to fit into memory.

Answered By: serge.v
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.