scipy.stats seed?

Question:

I am trying to generate scipy.stats.pareto.rvs(b, loc=0, scale=1, size=1) with different seed.

In numpy we can seed using numpy.random.seed(seed=233423).

Is there any way to seed the random number generated by scipy stats.

Note: I am not using numpy pareto because I want to give different values for scale.

Asked By: ashok

||

Answers:

scipy.stats just uses numpy.random to generate its random numbers, so numpy.random.seed() will work here as well. E.g.,

import numpy as np
from scipy.stats import pareto
b = 0.9
np.random.seed(seed=233423)
print pareto.rvs(b, loc=0, scale=1, size=5)
np.random.seed(seed=233423)
print pareto.rvs(b, loc=0, scale=1, size=5)

will print [ 9.7758784 10.78405752 4.19704602 1.19256849 1.02750628] twice.

Answered By: Kelsey

For those who happen upon this post four years later, Scipy DOES provide a way to pass a np.random.RandomState object to its random variable classes, see rv_continuous and rv_discrete for more details. The scipy documentation says this:

seed : None or int or numpy.random.RandomState instance, optional

This parameter defines the RandomState object to use for drawing random variates. If None (or np.random), the global np.random state is used. If integer, it is used to seed the local RandomState instance. Default is None.

Unfortunately, it appears this argument is not available after the continuous/discrete rvs subclass rv_continuous or rv_discrete. However, the random_state property does belong to the sublass, meaning we can set the seed using an instance of np.random.RandomState after instantiation like so:

import numpy as np
import scipy.stats as stats

alpha_rv = stats.alpha(3.57)
alpha_rv.random_state = np.random.RandomState(seed=342423)
Answered By: user5915738

Adding to the answer of user5915738, which I think is the best answer in general, I’d like to point out the imho most convenient way to seed the random generator of a scipy.stats distribution.

You can set the seed while generating the distribution with the rvs method, either by defining the seed as an integer, which is used to seed np.random.RandomState internally:

uni_int_seed = scipy.stats.uniform(-.1, 1.).rvs(10, random_state=12)

or by directly defining the np.random.RandomState:

uni_state_seed = scipy.stats.uniform(-.1, 1.).rvs(
    10, random_state=np.random.RandomState(seed=12))

Both methods are equivalent:

np.all(uni_int_seed == uni_state_seed)
# Out: True

The advantage of this method over assigning it to the random_state of rv_continuous or rv_discrete is, that you always have explicit control over the random state of your rvs, whereas with my_dist.random_state = np.random.RandomState(seed=342423) the seed is lost after each call to rvs, possibly resulting in non-reproducible results when losing track of distributions.
Also according to the The Zen of Python:

  1. Explicit is better than implicit.

🙂

Answered By: JE_Muc

For those who are stumbling upon this question 7 years later, there has been a major change in numpy Random State generator function. As per the documentation here and here, the RandomState class is replaced with the Generator class. RandomState is guaranteed to be compatible with older versions/codes however it will not receive any substantial changes, including algorithmic improvements, which are reserved for Generator.

For clarifications on how to pass an existing Numpy based random stream to Scipy functions in the same experiment, given below are some examples and reasonings for which cases are desirable and why.

from numpy.random import Generator, PCG64
from scipy.stats import binom

n, p, size, seed = 10, 0.5, 10, 12345

# Case 1 : Scipy uses some default Random Generator
numpy_randomGen = Generator(PCG64(seed))
scipy_randomGen = binom
print(scipy_randomGen.rvs(n, p, size))
print(numpy_randomGen.binomial(n, p, size))
# prints
# [6 6 5 4 6 6 8 6 6 4]
# [4 4 6 6 5 4 5 4 6 7]
# NOT DESIRABLE as we don't have control over the seed of Scipy random number generation


# Case 2 : Scipy uses same seed and Random generator (new object though)
scipy_randomGen.random_state=Generator(PCG64(seed))
numpy_randomGen = Generator(PCG64(seed))
print(scipy_randomGen.rvs(n, p, size))
print(numpy_randomGen.binomial(n, p, size))
# prints
# [4 4 6 6 5 4 5 4 6 7]
# [4 4 6 6 5 4 5 4 6 7]
    # This experiment is using same sequence of random numbers, one is being used by Scipy
# and other by Numpy. NOT DESIRABLE as we don't want repetition of some random 
# stream in same experiment.


# Case 3 (IMP) : Scipy uses an existing Random Generator which can being passed to Scipy based 
# random generator object
numpy_randomGen = Generator(PCG64(seed))
scipy_randomGen.random_state=numpy_randomGen
print(scipy_randomGen.rvs(n, p, size))
print(numpy_randomGen.binomial(n, p, size))
# prints
# [4 4 6 6 5 4 5 4 6 7]
# [4 8 6 3 5 7 6 4 6 4]
# This should be the case which we mostly want (DESIRABLE). If we are using both Numpy based and 
#Scipy based random number generators/function, then not only do we have no repetition of 
#random number sequences but also have reproducibility of results in this case.
Answered By: Abhinav