Generating random numbers with upper and lower standard deviation of the scatter

Question:

I have to produce randomly generated, normally distributed numbers, based on an astronomical table containing a most probable value and a standard deviation. The peculiar thing is that the standard deviation is not given by one, but two numbers – an upper standard deviation of the error and a lower, something like this:

mass_object, error_up, error_down
7.33, 0.12, 0.07
9.40, 0.04, 0.02
6.01, 0.11, 0.09
...

For example, this means for the first object that if a random mass m gets generated with m<7.33, then probably it will be further away from 7.33 than in the case that m>7.33. So I am looking now for a way to randomly generate the numbers and this way has to include the 2 possible standard deviations. If I was dealing with just one standard deviation per object, I would create the random number (mass) of the first object like that:

mass_random = np.random.normal(loc=7.33, scale=0.12)

Do you have ideas how to create these random numbers with upper and lower standard deviation of the scatter? Tnx

Asked By: NeStack

||

Answers:

As we discussed in the comments, a normal distribution has the same standard deviation in each direction (it’s symmetric around the mean). So we know our distribution won’t be normal. We can try a lognormal approach, since this allows us to introduce the idea of skewness. To do this in Python, you’ll need Scipy. Here’s a crude approach, assuming that 68% of data is on the mean, 16% is at the high point, and 16% is at the low point. We fit the distribution to that crude dataset, then we can calculate new points from the distribution:

import scipy.stats

#  Choose one of the rows
mean, high, low = 7.33, 0.12, 0.07

#  Create a dummy dataset to fit the distribution
values = [mean] * 68 + [mean + high] * 16 + [mean - low ] * 16

# Print the fit distribution
fit_dist = scipy.stats.lognorm.fit(values)
print(fit_dist)

#  Calculate 10 new random values based on the fit
scipy.stats.lognorm.rvs(*fit_dist, size=10)

array([7.25541865, 7.34873107, 7.33831589, 7.36387121, 7.26912469,
       7.33084677, 7.35626689, 7.33907124, 7.32522422, 7.31688687])
Answered By: SNygard

The immediate solution would be a two step sampling:
for a given row $i$, one samples from a uniform distribution over the interval error_down and error_up obtaining $sigma_i$, and then one samples the final value from a normal distribution with mean $m_i$ and standard deviation $sigma_i$.

In practice, one imports numpy, defines a custom function $sampling$ and, then, applies it at the whole table:

import numpy as np


def sampling (row) :

      sigma = np.random.uniform(row[1],      row[2])
      m = row[0]
      return (np.random.normal(m, sigma)) 
    

sampled_values = map(sampling, table)

Answered By: Paolo Vergottini