Re-number disjoint sections of an array, by order of appearance

Question:

Consider an array of contiguous "sections":

x = np.asarray([
   1, 1, 1, 1,
   9, 9, 9,
   3, 3, 3, 3, 3,
   5, 5, 5,
])

I don’t care about the actual values in the array. I only care that they demarcate disjoint sections of the array. I would like to renumber them so that the first section is all 0, the second second is all 1, and so on:

desired = np.asarray([
   0, 0, 0, 0,
   1, 1, 1,
   2, 2, 2, 2, 2,
   3, 3, 3,
])

What is an elegant way to perform this operation? I don’t expect there to be a single best answer, but I think this question could provide interesting opportunities to show off applications of various Numpy and other Python features.

Assume for the sake of this question that the array is 1-dimensional and non-empty.

Asked By: shadowtalker

||

Answers:

Here is a naïve but linear-time implementation using nditer:

def renumber(arr):
    assert arr.ndim == 1

    val_prev = None  # Arbitrary placeholder
    section_number = 0
    result = np.empty_like(arr, dtype=int)
    with np.nditer(
        [arr, result],
        flags=['c_index'],
        op_flags=[['readonly'], ['writeonly']]
    ) as it:
        for val_curr, res in it:
            if it.index > 0 and val_curr != val_prev:
                section_number += 1
            res[...] = section_number
            val_prev = val_curr
    return result

There are certainly fancier ways to do this, but this implementation should serve as a sensible baseline:

x = np.asarray([1, 1, 1, 1, 9, 9, 9, 3, 3, 3, 3, 3, 5, 5, 5])
desired = array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3])
np.testing.assert_array_equal(renumber(x), desired)
Answered By: shadowtalker

Note: There is a nicer equivalent of this in another answer.

My other answer essentially consists of comparing every value to the value before it, and incrementing a counter whenever that happens. This can be implemented in vectorized fashion by taking advantage of the fact that boolean True corresponds to integer 1, and False corresponds to 0.

def renumber(arr):
    assert x.ndim == 1
    return np.cumsum(np.insert(x[1:] != x[:-1], 0, x[0] != x[1]))

x = np.asarray([1, 1, 1, 1, 9, 9, 9, 3, 3, 3, 3, 3, 5, 5, 5])
desired = np.array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3])
np.testing.assert_array_equal(renumber(x), desired)

Note that this is a little clunky due to the need to np.insert the first value. I would be very interested to know if there is a more elegant way to achieve this.

Answered By: shadowtalker

Combining np.cumsum with np.diff allows you to do this.

a = np.cumsum(np.diff(x, prepend=x[0]) != 0)
Answered By: Roy Smart
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.