Python random sample of two arrays, but matching indices
Question:
I have two numpy arrays x and y, which have length 10,000.
I would like to plot a random subset of 1,000 entries of both x and y.
Is there an easy way to use the lovely, compact random.sample(population, k) on both x and y to select the same corresponding indices? (The y and x vectors are linked by a function y(x) say.)
Thanks.
Answers:
Just zip the two together and use that as the population:
import random
random.sample(zip(xs,ys), 1000)
The result will be 1000 pairs (2-tuples) of corresponding entries from xs
and ys
.
Update: For Python 3, you need to convert the zipped sequences into a list:
random.sample(list(zip(xs,ys)), 1000)
You can use np.random.choice
on an index array and apply it to both arrays:
idx = np.random.choice(np.arange(len(x)), 1000, replace=False)
x_sample = x[idx]
y_sample = y[idx]
After test numpy.random.choice
solution,
I found out it was very slow for larger array.
numpy.random.randint
should be much faster
example
x = np.arange(1e8)
y = np.arange(1e8)
idx = np.random.randint(0, x.shape[0], 10000)
return x[idx], y[idx]
Using the numpy.random.randint
function, you generate a list of random numbers, meaning that you can select certain datapoints twice.
I have two numpy arrays x and y, which have length 10,000.
I would like to plot a random subset of 1,000 entries of both x and y.
Is there an easy way to use the lovely, compact random.sample(population, k) on both x and y to select the same corresponding indices? (The y and x vectors are linked by a function y(x) say.)
Thanks.
Just zip the two together and use that as the population:
import random
random.sample(zip(xs,ys), 1000)
The result will be 1000 pairs (2-tuples) of corresponding entries from xs
and ys
.
Update: For Python 3, you need to convert the zipped sequences into a list:
random.sample(list(zip(xs,ys)), 1000)
You can use np.random.choice
on an index array and apply it to both arrays:
idx = np.random.choice(np.arange(len(x)), 1000, replace=False)
x_sample = x[idx]
y_sample = y[idx]
After test numpy.random.choice
solution,
I found out it was very slow for larger array.
numpy.random.randint
should be much faster
example
x = np.arange(1e8)
y = np.arange(1e8)
idx = np.random.randint(0, x.shape[0], 10000)
return x[idx], y[idx]
Using the numpy.random.randint
function, you generate a list of random numbers, meaning that you can select certain datapoints twice.