Re-number disjoint sections of an array, by order of appearance
Question:
Consider an array of contiguous "sections":
x = np.asarray([
1, 1, 1, 1,
9, 9, 9,
3, 3, 3, 3, 3,
5, 5, 5,
])
I don’t care about the actual values in the array. I only care that they demarcate disjoint sections of the array. I would like to renumber them so that the first section is all 0
, the second second is all 1
, and so on:
desired = np.asarray([
0, 0, 0, 0,
1, 1, 1,
2, 2, 2, 2, 2,
3, 3, 3,
])
What is an elegant way to perform this operation? I don’t expect there to be a single best answer, but I think this question could provide interesting opportunities to show off applications of various Numpy and other Python features.
Assume for the sake of this question that the array is 1-dimensional and non-empty.
Answers:
Here is a naïve but linear-time implementation using nditer
:
def renumber(arr):
assert arr.ndim == 1
val_prev = None # Arbitrary placeholder
section_number = 0
result = np.empty_like(arr, dtype=int)
with np.nditer(
[arr, result],
flags=['c_index'],
op_flags=[['readonly'], ['writeonly']]
) as it:
for val_curr, res in it:
if it.index > 0 and val_curr != val_prev:
section_number += 1
res[...] = section_number
val_prev = val_curr
return result
There are certainly fancier ways to do this, but this implementation should serve as a sensible baseline:
x = np.asarray([1, 1, 1, 1, 9, 9, 9, 3, 3, 3, 3, 3, 5, 5, 5])
desired = array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3])
np.testing.assert_array_equal(renumber(x), desired)
Note: There is a nicer equivalent of this in another answer.
My other answer essentially consists of comparing every value to the value before it, and incrementing a counter whenever that happens. This can be implemented in vectorized fashion by taking advantage of the fact that boolean True
corresponds to integer 1
, and False
corresponds to 0
.
def renumber(arr):
assert x.ndim == 1
return np.cumsum(np.insert(x[1:] != x[:-1], 0, x[0] != x[1]))
x = np.asarray([1, 1, 1, 1, 9, 9, 9, 3, 3, 3, 3, 3, 5, 5, 5])
desired = np.array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3])
np.testing.assert_array_equal(renumber(x), desired)
Note that this is a little clunky due to the need to np.insert
the first value. I would be very interested to know if there is a more elegant way to achieve this.
Combining np.cumsum
with np.diff
allows you to do this.
a = np.cumsum(np.diff(x, prepend=x[0]) != 0)
Consider an array of contiguous "sections":
x = np.asarray([
1, 1, 1, 1,
9, 9, 9,
3, 3, 3, 3, 3,
5, 5, 5,
])
I don’t care about the actual values in the array. I only care that they demarcate disjoint sections of the array. I would like to renumber them so that the first section is all 0
, the second second is all 1
, and so on:
desired = np.asarray([
0, 0, 0, 0,
1, 1, 1,
2, 2, 2, 2, 2,
3, 3, 3,
])
What is an elegant way to perform this operation? I don’t expect there to be a single best answer, but I think this question could provide interesting opportunities to show off applications of various Numpy and other Python features.
Assume for the sake of this question that the array is 1-dimensional and non-empty.
Here is a naïve but linear-time implementation using nditer
:
def renumber(arr):
assert arr.ndim == 1
val_prev = None # Arbitrary placeholder
section_number = 0
result = np.empty_like(arr, dtype=int)
with np.nditer(
[arr, result],
flags=['c_index'],
op_flags=[['readonly'], ['writeonly']]
) as it:
for val_curr, res in it:
if it.index > 0 and val_curr != val_prev:
section_number += 1
res[...] = section_number
val_prev = val_curr
return result
There are certainly fancier ways to do this, but this implementation should serve as a sensible baseline:
x = np.asarray([1, 1, 1, 1, 9, 9, 9, 3, 3, 3, 3, 3, 5, 5, 5])
desired = array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3])
np.testing.assert_array_equal(renumber(x), desired)
Note: There is a nicer equivalent of this in another answer.
My other answer essentially consists of comparing every value to the value before it, and incrementing a counter whenever that happens. This can be implemented in vectorized fashion by taking advantage of the fact that boolean True
corresponds to integer 1
, and False
corresponds to 0
.
def renumber(arr):
assert x.ndim == 1
return np.cumsum(np.insert(x[1:] != x[:-1], 0, x[0] != x[1]))
x = np.asarray([1, 1, 1, 1, 9, 9, 9, 3, 3, 3, 3, 3, 5, 5, 5])
desired = np.array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3])
np.testing.assert_array_equal(renumber(x), desired)
Note that this is a little clunky due to the need to np.insert
the first value. I would be very interested to know if there is a more elegant way to achieve this.
Combining np.cumsum
with np.diff
allows you to do this.
a = np.cumsum(np.diff(x, prepend=x[0]) != 0)