Sorting by subsampling every nth element in numpy array?

Question

I am trying to sample every nth element to sort an array. My current solution works, but it feels like there should be a solution that does not involve concatenation.

My current implementation is as follows.

arr = np.arange(10)
print(arr)
[0 1 2 3 4 5 6 7 8 9]

# sample every 5th element
res = np.empty(shape=0)
for i in range(5):
    res = np.concatenate([res, arr[i::5]])
    
print(res)
[0. 5. 1. 6. 2. 7. 3. 8. 4. 9.]

Looking for any tips to make this faster/more pythonic. My use case is with an array of ~10,000 values.

Asked By: BoomBoxBoy

||

Source

Answer 1

Reshape your vector into in a 2D array with N elements per row, and then flatten it column-wise:

import numpy as np

# Pick "subsample stride"
N = 5

# Create a vector with length divisible by N.
arr = np.arange(2 * N)
print(arr)

# Reshape arr into a 2D array with N elements per row and however many
# columns required. 
# Flatten it with "F" ordering for "Fortran style" (column-major).
output = arr.reshape(-1, N).flatten("F")
print(output)

outputs

[0 1 2 3 4 5 6 7 8 9]
[0 5 1 6 2 7 3 8 4 9]

Performance comparison

Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.31.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import numpy as np

In [2]: def sol0(arr):
   ...:     """OP's original solution."""
   ...:     res = np.empty(shape=0)
   ...:     for i in range(5):
   ...:         res = np.concatenate([res, arr[i::5]])
   ...:     return res
   ...: 

In [3]: def sol1(arr):  
   ...:     """This answer's solution."""
   ...:     return arr.reshape(-1, 5).flatten("F")
   ...: 

In [4]: def sol2(arr):
   ...:     """@seralouk's solution, with shape error patch"""
   ...:     res = np.empty((5, arr.size//5), order='F')
   ...:     for i in range(5):
   ...:         res[i::5] = arr[i::5]
   ...:     return res.reshape(-1)

In [5]: arr = np.arange(10_000)

In [6]: %timeit sol0(arr)
26.6 µs ± 724 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [7]: %timeit sol1(arr)
7.81 µs ± 34 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [8]: %timeit sol2(arr)
36.3 µs ± 841 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Answered By: Brian

Answer 2

I would initialize an empty array and use two times advanced indexing to store the results:

arr = np.arange(10)
print(arr)
# [0 1 2 3 4 5 6 7 8 9]

# sample every 5th element
res = np.empty(shape=(arr.shape[0]//5,))
for i in range(5):
    res[i::5] = arr[i::5]

print(res)
# [0. 5. 1. 6. 2. 7. 3. 8. 4. 9.]

Time to execute my code (%timeit):

18.8 ns ± 0.498 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)

your code:

19.2 ns ± 0.869 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)
@Brian’s solution is the slowest: 21.3 ns ± 1.03 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Answered By: seralouk

Sorting by subsampling every nth element in numpy array?

Question:

Answers:

Performance comparison