Given a 2D Numpy array representing a 2D distribution, how to sample data from this distribution with the aid of Numpy or Scipy functions?

Question:

Given a 2D numpy array dist with shape (200,200), where each entry of the array represents the joint probability of (x1, x2) for all x1 , x2 ∈ {0, 1, . . . , 199}. How do I sample bivariate data x = (x1, x2) from this probability distribution with the aid of Numpy or Scipy API?

Asked By: jabberwoo

||

Answers:

Here’s a way, but I’m sure there’s a much more elegant solution using scipy.
numpy.random doesn’t deal with 2d pmfs, so you have to do some reshaping gymnastics to go this way.

import numpy as np

# construct a toy joint pmf
dist=np.random.random(size=(200,200)) # here's your joint pmf 
dist/=dist.sum() # it has to be normalized 

# generate the set of all x,y pairs represented by the pmf
pairs=np.indices(dimensions=(200,200)).T # here are all of the x,y pairs 

# make n random selections from the flattened pmf without replacement
# whether you want replacement depends on your application
n=50 
inds=np.random.choice(np.arange(200**2),p=dist.reshape(-1),size=n,replace=False)

# inds is the set of n randomly chosen indicies into the flattened dist array...
# therefore the random x,y selections
# come from selecting the associated elements
# from the flattened pairs array
selections = pairs.reshape(-1,2)[inds]
Answered By: kevinkayaks

I can’t comment, but to improve kevinkayaks answer’s :

pairs=np.indices(dimensions=(200,200)).T
selections = pairs.reshape(-1,2)[inds]

Is not needed can be replace by :

np.array([inds//m, inds%m]).T

The matrix “pairs” is not needed anymore.

Answered By: Hv0nnus HACH

This solution works with probability distributions of any number of dimensions, assuming they are a valid probability distribution (its contents must sum to 1, etc.). It flattens the distribution, samples from that, and adjusts the random index to match the original array shape.

# Create a flat copy of the array
flat = array.flatten()

# Then, sample an index from the 1D array with the
# probability distribution from the original array
sample_index = np.random.choice(a=flat.size, p=flat)

# Take this index and adjust it so it matches the original array
adjusted_index = np.unravel_index(sample_index, array.shape)
print(adjusted_index)

Also, to get multiple samples, add a size keyword argument to the np.random.choice call, and modify adjusted_index before printing it:

adjusted_index = np.array(zip(*adjusted_index))

This is necessary because np.random.choice with a size argument outputs a list of indices for each coordinate dimension, so this zips them into a list of coordinate tuples. This is also much more efficient than simply repeating the first code.


Relevant documentation:

Answered By: applemonkey496

I can’t comment either, but @applemonkey496 ‘s suggestion for getting multiple samples doesn’t work as written. It’s an excellent solution otherwise.

Instead of

adjusted_index = np.array(zip(*adjusted_index))

adjusted_index should be converted to a python list before trying to put it into a numpy array (numpy arrays do not accept zipped objects), eg:

adjusted_index = np.array(list(zip(*adjusted_index)))
Answered By: Andrew Reeves