Consistently create same random numpy array

Question:

I am waiting for another developer to finish a piece of code that will return an np array of shape (100,2000) with values of either -1,0, or 1.

In the meantime, I want to randomly create an array of the same characteristics so I can get a head start on my development and testing. The thing is that I want this randomly created array to be the same each time, so that I’m not testing against an array that keeps changing its value each time I re-run my process.

I can create my array like this, but is there a way to create it so that it’s the same each time. I can pickle the object and unpickle it, but wondering if there’s another way.

r = np.random.randint(3, size=(100, 2000)) - 1
Asked By: Idr

||

Answers:

Simply seed the random number generator with a fixed value, e.g.

numpy.random.seed(42)

This way, you’ll always get the same random number sequence.

This function will seed the global default random number generator, and any call to a function in numpy.random will use and alter its state. This is fine for many simple use cases, but it’s a form of global state with all the problems global state brings. For a cleaner solution, see Robert Kern’s answer below.

Answered By: Sven Marnach

Create your own instance of numpy.random.RandomState() with your chosen seed. Do not use numpy.random.seed() except to work around inflexible libraries that do not let you pass around your own RandomState instance.

[~]
|1> from numpy.random import RandomState

[~]
|2> prng = RandomState(1234567890)

[~]
|3> prng.randint(-1, 2, size=10)
array([ 1,  1, -1,  0,  0, -1,  1,  0, -1, -1])

[~]
|4> prng2 = RandomState(1234567890)

[~]
|5> prng2.randint(-1, 2, size=10)
array([ 1,  1, -1,  0,  0, -1,  1,  0, -1, -1])
Answered By: Robert Kern

If you are using other functions relying on a random state, you can’t just set and overall seed, but should instead create a function to generate your random list of number and set the seed as a parameter of the function. This will not disturb any other random generators in the code:

# Random states
def get_states(random_state, low, high, size):
    rs = np.random.RandomState(random_state)
    states = rs.randint(low=low, high=high, size=size)
    return states

# Call function
states = get_states(random_state=42, low=2, high=28347, size=25)
Answered By: mari756h

It is important to understand what is the seed of a random generator and when/how it is set in your code (check e.g. here for a nice explanation of the mathematical meaning of the seed).

For that you need to set the seed by doing:

random_state = np.random.RandomState(seed=your_favorite_seed_value)

It is then important to generate the random numbers from random_state and not from np.random. I.e. you should do:

random_state.randint(...)

instead of

np.random.randint(...) 

which will create a new instance of RandomState() and basically use your computer internal clock to set the seed.

Answered By: t_sic

I just want to clarify something in regard to @Robert Kern answer just in case that is not clear. Even if you do use the RandomState you would have to initialize it every time you call a numpy random method like in Robert’s example otherwise you’ll get the following results.

Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> prng = np.random.RandomState(2019)
>>> prng.randint(-1, 2, size=10)
array([-1,  1,  0, -1,  1,  1, -1,  0, -1,  1])
>>> prng.randint(-1, 2, size=10)
array([-1, -1, -1,  0, -1, -1,  1,  0, -1, -1])
>>> prng.randint(-1, 2, size=10)
array([ 0, -1, -1,  0,  1,  1, -1,  1, -1,  1])
>>> prng.randint(-1, 2, size=10)
array([ 1,  1,  0,  0,  0, -1,  1,  1,  0, -1])
Answered By: Kirk Walla

Based on the latest updates in Random sampling the preferred way is to use Generators instead of RandomState. Refer to What’s new or different to compare both approaches. One of the key changes is the difference between the slow Mersenne Twister pseudo-random number generator (RandomState) and a stream of random bits based on different algorithms (BitGenerators) used in the new approach (Generators).

Otherwise, the steps for producing random numpy array is very similar:

  1. Initialize random generator

Instead of RandomState you will initialize random generator. default_rng is the recommended constructor for the random Generator, but you can ofc try another ways.

import numpy as np

rng = np.random.default_rng(42)
# rng -> Generator(PCG64)
  1. Generate numpy array

Instead of randint method, there is Generator.integers method which is now the canonical way to generate integer random numbers from a discrete uniform distribution (see already mentioned What’s new or different summary). Note, that endpoint=True uses [low, high] interval for sampling instead of the default [low, high).

arr = rng.integers(-1, 1, size=10, endpoint=True)
# array([-1,  1,  0,  0,  0,  1, -1,  1, -1, -1])

As already discussed, you have to initialize random generator (or random state) every time to generate identical array. Therefore, the simplest thing is to define custom function similar to the one from @mari756h answer:

def get_array(low, high, size, random_state=42, endpoint=True):
    rng = np.random.default_rng(random_state)
    return rng.integers(low, high, size=size, endpoint=endpoint)

When you call the function with the same parameters you will always get the identical numpy array.

get_array(-1, 1, 10)
# array([-1,  1,  0,  0,  0,  1, -1,  1, -1, -1])

get_array(-1, 1, 10, random_state=12345)  # change random state to get different array
# array([ 1, -1,  1, -1, -1,  1,  0,  1,  1,  0])

get_array(-1, 1, (2, 2), endpoint=False)
# array([[-1,  0],
#        [ 0, -1]])

And for your needs you would use get_array(-1, 1, size=(100, 2000)).

Answered By: Nerxis
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.