Best way to permute contents of each column in numpy
Question:
What’s the best way to efficiently permute the contents of each column in a numpy array?
What I have is something like:
>>> arr = np.arange(16).reshape((4, 4))
>>> arr
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
>> # Shuffle each column independently to obtain something like
array([[ 8, 5, 10, 7],
[ 12, 1, 6, 3],
[ 4, 9, 14, 11],
[ 0, 13, 2, 15]])
Answers:
If your array is multi-dimensional, np.random.permutation
permutes along the first axis (columns) by default:
>>> np.random.permutation(arr)
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[ 0, 1, 2, 3],
[12, 13, 14, 15]])
However, this shuffles the row indices and so each column has the same (random) ordering.
The simplest way of shuffling each column independently could be to loop over the columns and use np.random.shuffle
to shuffle each one in place:
for i in range(arr.shape[1]):
np.random.shuffle(arr[:,i])
Which gives, for instance:
array([[12, 1, 14, 11],
[ 4, 9, 10, 7],
[ 8, 5, 6, 15],
[ 0, 13, 2, 3]])
This method can be useful if you have a very large array which you don’t want to copy because the permutation of each column is done in place. On the other hand, even simple Python loops can be very slow and there are quicker NumPy methods such as the one provided by @jme.
Here’s another way of doing this:
def permute_columns(x):
ix_i = np.random.sample(x.shape).argsort(axis=0)
ix_j = np.tile(np.arange(x.shape[1]), (x.shape[0], 1))
return x[ix_i, ix_j]
A quick test:
>>> x = np.arange(16).reshape(4,4)
>>> permute_columns(x)
array([[ 8, 9, 2, 3],
[ 0, 5, 10, 11],
[ 4, 13, 14, 7],
[12, 1, 6, 15]])
The idea is to generate a bunch of random numbers, then argsort
them within each column independently. This produces a random permutation of each column’s indices.
Note that this has sub-optimal asymptotic time complexity, since the sort takes time O(n m log m)
for an array of size m x n
. But since Python’s for
loops are pretty slow, you actually get better performance for all but very tall matrices.
To perform a permutation along the row axis of an array, you can use the following code:
np.random.permutation(arr) # If you want to make a copy of the array
Or:
np.random.shuffle(arr) # if you want to change the array in-place
However, if you have a multi-dimensional array, you can use the following code to perform the permutation along a specific axis:
sampler = np.random.permutation(4) # Size of the selected axis
df.take(sampler, axis=0) # You can select your desired axis from here
For example, suppose you want to permute the following array along its second axis:
Permutate this over axis 1
arr = np.arange(20).reshape((4, 5))
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
You can define the sampler as follows:
sampler = np.random.permutation(5)
array([2, 1, 3, 4, 0])
Then, you can apply the permutation using the take() method:
arr.take(sampler, axis = 1)
Out:
array([[ 1, 2, 3, 0, 4],
[ 6, 7, 8, 5, 9],
[11, 12, 13, 10, 14],
[16, 17, 18, 15, 19]])
If you want to shuffle all the elements along all axises you can do this
np.random.permutation(arr.flatten()).reshape((4, 5)) # This makes a copy of the array
What’s the best way to efficiently permute the contents of each column in a numpy array?
What I have is something like:
>>> arr = np.arange(16).reshape((4, 4))
>>> arr
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
>> # Shuffle each column independently to obtain something like
array([[ 8, 5, 10, 7],
[ 12, 1, 6, 3],
[ 4, 9, 14, 11],
[ 0, 13, 2, 15]])
If your array is multi-dimensional, np.random.permutation
permutes along the first axis (columns) by default:
>>> np.random.permutation(arr)
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[ 0, 1, 2, 3],
[12, 13, 14, 15]])
However, this shuffles the row indices and so each column has the same (random) ordering.
The simplest way of shuffling each column independently could be to loop over the columns and use np.random.shuffle
to shuffle each one in place:
for i in range(arr.shape[1]):
np.random.shuffle(arr[:,i])
Which gives, for instance:
array([[12, 1, 14, 11],
[ 4, 9, 10, 7],
[ 8, 5, 6, 15],
[ 0, 13, 2, 3]])
This method can be useful if you have a very large array which you don’t want to copy because the permutation of each column is done in place. On the other hand, even simple Python loops can be very slow and there are quicker NumPy methods such as the one provided by @jme.
Here’s another way of doing this:
def permute_columns(x):
ix_i = np.random.sample(x.shape).argsort(axis=0)
ix_j = np.tile(np.arange(x.shape[1]), (x.shape[0], 1))
return x[ix_i, ix_j]
A quick test:
>>> x = np.arange(16).reshape(4,4)
>>> permute_columns(x)
array([[ 8, 9, 2, 3],
[ 0, 5, 10, 11],
[ 4, 13, 14, 7],
[12, 1, 6, 15]])
The idea is to generate a bunch of random numbers, then argsort
them within each column independently. This produces a random permutation of each column’s indices.
Note that this has sub-optimal asymptotic time complexity, since the sort takes time O(n m log m)
for an array of size m x n
. But since Python’s for
loops are pretty slow, you actually get better performance for all but very tall matrices.
To perform a permutation along the row axis of an array, you can use the following code:
np.random.permutation(arr) # If you want to make a copy of the array
Or:
np.random.shuffle(arr) # if you want to change the array in-place
However, if you have a multi-dimensional array, you can use the following code to perform the permutation along a specific axis:
sampler = np.random.permutation(4) # Size of the selected axis
df.take(sampler, axis=0) # You can select your desired axis from here
For example, suppose you want to permute the following array along its second axis:
Permutate this over axis 1
arr = np.arange(20).reshape((4, 5))
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
You can define the sampler as follows:
sampler = np.random.permutation(5)
array([2, 1, 3, 4, 0])
Then, you can apply the permutation using the take() method:
arr.take(sampler, axis = 1)
Out:
array([[ 1, 2, 3, 0, 4],
[ 6, 7, 8, 5, 9],
[11, 12, 13, 10, 14],
[16, 17, 18, 15, 19]])
If you want to shuffle all the elements along all axises you can do this
np.random.permutation(arr.flatten()).reshape((4, 5)) # This makes a copy of the array