numpy get indexes of connected array values

Question:

I have a 1d numpy array that looks like this:

a = np.array([1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1])

Is there a way to get the indexes of start and end of each cluster of values. So basically I would get this:

[
    # clusters with value 1 (cluster with values 0 aren't needed)
    [
        # start and end of each cluster
        [0, 2],
        [8, 11],
        [13, 14],
    ],
]

I’m not very skilled with numpy. I know there are lots of cool functions, but I have no idea which ones to use. Also googling this problem didn’t give me anything since people usually have pretty specific problems that are different than mine. I know that for example np.split won’t be enough here.

Please help me if you can, I can provide you with more examples or details if needed. I’ll try to respond as quickly as possible. Thank you for you time.

Asked By: acmpo6ou

||

Answers:

Maybe this what you want? Try it and see if it helps you:

import numpy as np

a = np.array([1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1])

# find the start of each cluster
starts = np.where(np.diff(np.concatenate(([0], a, [0]))) == 1)[0]

# find the end of each cluster
ends = np.where(np.diff(np.concatenate(([0], a, [0]))) == -1)[0] - 1

# combine start and end indexes into a list o
clusters = list(zip(starts, ends))

print(clusters)

2nd version as requested:

#  find the indexes where the value of a changes
change_idxs = np.flatnonzero(np.diff(a))

# add the start and end indexes of the array as boundaries
boundaries = np.concatenate(([0], change_idxs+1, [len(a)]))

# group consecutive boundaries 
clusters = [(boundaries[i], boundaries[i+1]-1) for i in range(len(boundaries)-1) if i % 2 == 0]

print(clusters)  # [(0, 2), (8, 11), (13, 14)]
Answered By: Daniel Hao
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.