# Fastest Way to generate 1,000,000+ random numbers in python

## Question:

I am currently writing an app in python that needs to generate large amount of random numbers, FAST. Currently I have a scheme going that uses numpy to generate all of the numbers in a giant batch (about ~500,000 at a time). While this seems to be faster than python’s implementation. I still need it to go faster. Any ideas? I’m open to writing it in C and embedding it in the program or doing w/e it takes.

Constraints on the random numbers:

- A Set of 7 numbers that can all have different bounds:
- eg: [0-X1, 0-X2, 0-X3, 0-X4, 0-X5, 0-X6, 0-X7]
- Currently I am generating a list of 7 numbers with random values from [0-1) then multiplying by [X1..X7]

- A Set of 13 numbers that all add up to 1
- Currently just generating 13 numbers then dividing by their sum

Any ideas? Would pre calculating these numbers and storing them in a file make this faster?

Thanks!

## Answers:

**EDIT** Created functions that return the full set of numbers, not just one row at a time.

**EDIT 2** Make the functions more pythonic (and faster), add solution for second question

For the first set of numbers, you might consider `numpy.random.randint`

or `numpy.random.uniform`

, which take `low`

and `high`

parameters. Generating an array of 7 x 1,000,000 numbers in a specified range seems to take < 0.7 second on my 2 GHz machine:

```
def LimitedRandInts(XLim, N):
rowlen = (1,N)
return [np.random.randint(low=0,high=lim,size=rowlen) for lim in XLim]
def LimitedRandDoubles(XLim, N):
rowlen = (1,N)
return [np.random.uniform(low=0,high=lim,size=rowlen) for lim in XLim]
>>> import numpy as np
>>> N = 1000000 #number of randoms in each range
>>> xLim = [x*500 for x in range(1,8)] #convenient limit generation
>>> fLim = [x/7.0 for x in range(1,8)]
>>> aa = LimitedRandInts(xLim, N)
>>> ff = LimitedRandDoubles(fLim, N)
```

This returns integers in [0,xLim-1] or floats in [0,fLim). The integer version took ~0.3 seconds, the double ~0.66, on my 2 GHz single-core machine.

For the second set, I used @Joe Kingston’s suggestion.

```
def SumToOneRands(NumToSum, N):
aa = np.random.uniform(low=0,high=1.0,size=(NumToSum,N)) #13 rows by 1000000 columns, for instance
s = np.reciprocal(aa.sum(0))
aa *= s
return aa.T #get back to column major order, so aa[k] is the kth set of 13 numbers
>>> ll = SumToOneRands(13, N)
```

This takes ~1.6 seconds.

In all cases, `result[k]`

gives you the kth set of data.

Making your code run in parallel certainly couldn’t hurt. Try adapting it for SMP with Parallel Python

You can speed things up a bit from what mtrw posted above just by doing what you initially described (generating a bunch of random numbers and multiplying and dividing accordingly)…

Also, you probably already know this, but be sure to do the operations in-place (*=, /=, +=, etc) when working with large-ish numpy arrays. It makes a huge difference in memory usage with large arrays, and will give a considerable speed increase, too.

```
In [53]: def rand_row_doubles(row_limits, num):
....: ncols = len(row_limits)
....: x = np.random.random((num, ncols))
....: x *= row_limits
....: return x
....:
In [59]: %timeit rand_row_doubles(np.arange(7) + 1, 1000000)
10 loops, best of 3: 187 ms per loop
```

As compared to:

```
In [66]: %timeit ManyRandDoubles(np.arange(7) + 1, 1000000)
1 loops, best of 3: 222 ms per loop
```

It’s not a huge difference, but if you’re *really* worried about speed, it’s something.

Just to show that it’s correct:

```
In [68]: x.max(0)
Out[68]:
array([ 0.99999991, 1.99999971, 2.99999737, 3.99999569, 4.99999836,
5.99999114, 6.99999738])
In [69]: x.min(0)
Out[69]:
array([ 4.02099599e-07, 4.41729377e-07, 4.33480302e-08,
7.43497138e-06, 1.28446819e-05, 4.27614385e-07,
1.34106753e-05])
```

Likewise, for your “rows sum to one” part…

```
In [70]: def rand_rows_sum_to_one(nrows, ncols):
....: x = np.random.random((ncols, nrows))
....: y = x.sum(axis=0)
....: x /= y
....: return x.T
....:
In [71]: %timeit rand_rows_sum_to_one(1000000, 13)
1 loops, best of 3: 455 ms per loop
In [72]: x = rand_rows_sum_to_one(1000000, 13)
In [73]: x.sum(axis=1)
Out[73]: array([ 1., 1., 1., ..., 1., 1., 1.])
```

Honestly, even if you re-implement things in C, I’m not sure you’ll be able to beat numpy by much on this one… I could be very wrong, though!

Try `r = 1664525*r + 1013904223`

from “an even quicker generator”

in “Numerical Recipes in C” 2nd edition, Press et al., isbn 0521431085, p. 284.

np.random is certainly “more random”; see

Linear congruential generator .

In python, use `np.uint32`

like this:

```
python -mtimeit -s '
import numpy as np
r = 1
r = np.array([r], np.uint32)[0] # 316 py -> 16 us np
# python longs can be arbitrarily long, so slow
' '
r = r*1664525 + 1013904223 # NR2 p. 284
'
```

To generate big blocks at a time:

```
# initialize --
np.random.seed( ... )
R = np.random.randint( 0, np.iinfo( np.uint32 ).max, size, dtype=np.uint32 )
...
R *= 1664525
R += 1013904223
```

As others have already pointed out, `numpy`

is a very good start, fast and easy to use.

If you need random numbers on a massive scale, consider eas-ecb or rc4. Both can be parallelised, you should reach performance in several GB/s.

Just a quick example of `numpy`

in action:

```
data = numpy.random.rand(1000000)
```

No need for loop, you can pass in how many numbers you want to generate.

If you have access to multiple cores, the computations can be done in parallel with `dask.array`

:

```
import dask.array as da
x = da.random.random(size=(rows, cols)).compute()
# .compute is not necessary here, because calculations
# can continue in a lazy form and .compute is used
# on the final result
```

```
import random
for i in range(1000000):
print(random.randint(1, 1000000))
```

Here’s a code in Python that you can use to generate one million random numbers, one per line!