Why random.randint() is much slower than random.getrandbits()
Question:
A made a test which compares random.randint()
and random.getrandbits()
in Python. The result shows that getrandbits
is way faster than randint
.
from random import randint, getrandbits, seed
from datetime import datetime
def ts():
return datetime.now().timestamp()
def diff(st, en):
print(f'{round((en - st) * 1000, 3)} ms')
seed(111)
N = 6
rn = []
st = ts()
for _ in range(10**N):
rn.append(randint(0, 1023))
en = ts()
diff(st,en)
rn = []
st = ts()
for _ in range(10**N):
rn.append(getrandbits(10))
en = ts()
diff(st,en)
The numbers range is exactly the same, because 10
random bits is range from 0
(in case of 0000000000
) to 1023
(in case of 1111111111
).
The output of the code:
590.509 ms
91.01 ms
As you can see, getrandbits
under the same conditions is almost 6.5
times faster. But why? Can somebody explain that?
Answers:
randint
uses randrange
which uses _randbelow
which is _randbelow_with_get_randbits
(if it is present) which uses getrandbits
. However, randrange
includes overhead on top of the underlying call to getrandbits
because it must check for the start and step arguments. Even though there are fast paths for these checks, they are still there. There are definitely other things contributing to the 6.5 slowdown when using randint
, but this answer atleast shows you that randint
will always be slower than getrandbits
.
getrandbits()
is a very short function going to native code almost immediately:
def getrandbits(self, k):
"""getrandbits(k) -> x. Generates an int with k random bits."""
if k < 0:
raise ValueError('number of bits must be non-negative')
numbytes = (k + 7) // 8 # bits / 8 and rounded up
x = int.from_bytes(_urandom(numbytes), 'big')
return x >> (numbytes * 8 - k) # trim excess bits
_urandom
is os.urandom
, which you can’t even find in os.py
in the same folder, as it’s native code. See details here: Where can I find the source code of os.urandom()?
randint()
is a return self.randrange(a, b+1)
call, where randrange()
is a long and versatile function, almost 100 lines of Python code, its "fast track" ends in a return istart + self._randbelow(width)
after circa 50 lines of checking and initializing.
_randbelow()
is actually a choice, depending on the existence of getrandbits()
which exists in your case, so probably this
one is running:
def _randbelow_with_getrandbits(self, n):
"Return a random int in the range [0,n). Returns 0 if n==0."
if not n:
return 0
getrandbits = self.getrandbits
k = n.bit_length() # don't use (n-1) here because n can be 1
r = getrandbits(k) # 0 <= r < 2**k
while r >= n:
r = getrandbits(k)
return r
so randint()
ends in the same getrandbits()
after doing a lot of other things, and that’s pretty much how it’s slower than it.
The underlying reason is the function that getrandbits
calls is written in C
, look at the source code at GitHub Python repository
You will find that getrandbits
under the hood calls urandom
from os
module, and if you look at the implementation of this function, you’ll find that it is indeed written in C.
os_urandom_impl(PyObject *module, Py_ssize_t size)
/*[clinic end generated code: output=42c5cca9d18068e9 input=4067cdb1b6776c29]*/
{
PyObject *bytes;
int result;
if (size < 0)
return PyErr_Format(PyExc_ValueError,
"negative argument not allowed");
bytes = PyBytes_FromStringAndSize(NULL, size);
if (bytes == NULL)
return NULL;
result = _PyOS_URandom(PyBytes_AS_STRING(bytes), PyBytes_GET_SIZE(bytes));
if (result == -1) {
Py_DECREF(bytes);
return NULL;
}
return bytes;
}
Talking about randint
, if you look at the source code for this function, it calls randrange
, which further calls _randbelow_with_get_randbits
.
def randrange(self, start, stop=None, step=_ONE):
"""Choose a random item from range(start, stop[, step]).
This fixes the problem with randint() which includes the
endpoint; in Python this is usually not what you want.
"""
# This code is a bit messy to make it fast for the
# common case while still doing adequate error checking.
try:
istart = _index(start)
except TypeError:
istart = int(start)
if istart != start:
_warn('randrange() will raise TypeError in the future',
DeprecationWarning, 2)
raise ValueError("non-integer arg 1 for randrange()")
_warn('non-integer arguments to randrange() have been deprecated '
'since Python 3.10 and will be removed in a subsequent '
'version',
DeprecationWarning, 2)
if stop is None:
# We don't check for "step != 1" because it hasn't been
# type checked and converted to an integer yet.
if step is not _ONE:
raise TypeError('Missing a non-None stop argument')
if istart > 0:
return self._randbelow(istart)
raise ValueError("empty range for randrange()")
# stop argument supplied.
try:
istop = _index(stop)
except TypeError:
istop = int(stop)
if istop != stop:
_warn('randrange() will raise TypeError in the future',
DeprecationWarning, 2)
raise ValueError("non-integer stop for randrange()")
_warn('non-integer arguments to randrange() have been deprecated '
'since Python 3.10 and will be removed in a subsequent '
'version',
DeprecationWarning, 2)
width = istop - istart
try:
istep = _index(step)
except TypeError:
istep = int(step)
if istep != step:
_warn('randrange() will raise TypeError in the future',
DeprecationWarning, 2)
raise ValueError("non-integer step for randrange()")
_warn('non-integer arguments to randrange() have been deprecated '
'since Python 3.10 and will be removed in a subsequent '
'version',
DeprecationWarning, 2)
# Fast path.
if istep == 1:
if width > 0:
return istart + self._randbelow(width)
raise ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width))
# Non-unit step argument supplied.
if istep > 0:
n = (width + istep - 1) // istep
elif istep < 0:
n = (width + istep + 1) // istep
else:
raise ValueError("zero step for randrange()")
if n <= 0:
raise ValueError("empty range for randrange()")
return istart + istep * self._randbelow(n)
def randint(self, a, b):
"""Return random integer in range [a, b], including both end points.
"""
return self.randrange(a, b+1)
As you can see, there are so many things going on with Try
/Except
block in randrange
function and as they are written in Python, due to all these overheads and control flow, it’s slow.
This is the code that gets executed when you call randint
:
#!/usr/bin/python3
import random
random.randint(0, 1023)
$ python3 -m trace -t s.py
zen importlib._bootstrap>(194): s.py(4): random.randint(0, 1023)
--- modulename: random, funcname: randint
random.py(339): return self.randrange(a, b+1)
--- modulename: random, funcname: randrange
random.py(301): istart = int(start)
random.py(302): if istart != start:
random.py(304): if stop is None:
random.py(310): istop = int(stop)
random.py(311): if istop != stop:
random.py(313): width = istop - istart
random.py(314): if step == 1 and width > 0:
random.py(315): return istart + self._randbelow(width)
--- modulename: random, funcname: _randbelow_with_getrandbits
random.py(241): if not n:
random.py(243): getrandbits = self.getrandbits
random.py(244): k = n.bit_length() # don't use (n-1) here because n can be 1
random.py(245): r = getrandbits(k) # 0 <= r < 2**k
random.py(246): while r >= n:
random.py(248): return r
So the answer is that there’s some extra boilerplate that ends up calling getrandbits
, which you call directly otherwise. Note also that calling _randbelow_with_getrandbits
with a power of two exhibits the worst case behavior: It will sample getrandbits(11)
until it gets a number with 10 bits, so it will discard half of the values (this is not the main reason why it is slower, but adding the rejection sampling logic to your calls to getrandbits
make them ~twice as slow).
Some of the answers say that getrandbits
end up calling os.urandom
. That is not true.
The regular getrandbits
function is implemented directly in C in _random_Random_getrandbits_impl
. The one that calls os.urandom
is from SystemRandom
.
You can check it by using the trace module:
#!/usr/bin/python3
import random
r = random.SystemRandom()
r.getrandbits(5)
random.getrandbits(5)
$ python3 -m trace -t s.py
...
<frozen importlib._bootstrap>(186): <frozen importlib._bootstrap>(187): <frozen importlib._bootstrap>(191): <frozen importlib._bootstrap>(192): <frozen importlib._bootstrap>(194): s.py(4): r = random.SystemRandom()
--- modulename: random, funcname: __init__
random.py(123): self.seed(x)
--- modulename: random, funcname: seed
random.py(800): return None
random.py(124): self.gauss_next = None
s.py(6): r.getrandbits(5)
--- modulename: random, funcname: getrandbits
random.py(786): if k < 0:
random.py(788): numbytes = (k + 7) // 8 # bits / 8 and rounded up
random.py(789): x = int.from_bytes(_urandom(numbytes), 'big')
random.py(790): return x >> (numbytes * 8 - k) # trim excess bits
s.py(8): random.getrandbits(5)
A made a test which compares random.randint()
and random.getrandbits()
in Python. The result shows that getrandbits
is way faster than randint
.
from random import randint, getrandbits, seed
from datetime import datetime
def ts():
return datetime.now().timestamp()
def diff(st, en):
print(f'{round((en - st) * 1000, 3)} ms')
seed(111)
N = 6
rn = []
st = ts()
for _ in range(10**N):
rn.append(randint(0, 1023))
en = ts()
diff(st,en)
rn = []
st = ts()
for _ in range(10**N):
rn.append(getrandbits(10))
en = ts()
diff(st,en)
The numbers range is exactly the same, because 10
random bits is range from 0
(in case of 0000000000
) to 1023
(in case of 1111111111
).
The output of the code:
590.509 ms
91.01 ms
As you can see, getrandbits
under the same conditions is almost 6.5
times faster. But why? Can somebody explain that?
randint
uses randrange
which uses _randbelow
which is _randbelow_with_get_randbits
(if it is present) which uses getrandbits
. However, randrange
includes overhead on top of the underlying call to getrandbits
because it must check for the start and step arguments. Even though there are fast paths for these checks, they are still there. There are definitely other things contributing to the 6.5 slowdown when using randint
, but this answer atleast shows you that randint
will always be slower than getrandbits
.
getrandbits()
is a very short function going to native code almost immediately:
def getrandbits(self, k):
"""getrandbits(k) -> x. Generates an int with k random bits."""
if k < 0:
raise ValueError('number of bits must be non-negative')
numbytes = (k + 7) // 8 # bits / 8 and rounded up
x = int.from_bytes(_urandom(numbytes), 'big')
return x >> (numbytes * 8 - k) # trim excess bits
_urandom
is os.urandom
, which you can’t even find in os.py
in the same folder, as it’s native code. See details here: Where can I find the source code of os.urandom()?
randint()
is a return self.randrange(a, b+1)
call, where randrange()
is a long and versatile function, almost 100 lines of Python code, its "fast track" ends in a return istart + self._randbelow(width)
after circa 50 lines of checking and initializing.
_randbelow()
is actually a choice, depending on the existence of getrandbits()
which exists in your case, so probably this
one is running:
def _randbelow_with_getrandbits(self, n):
"Return a random int in the range [0,n). Returns 0 if n==0."
if not n:
return 0
getrandbits = self.getrandbits
k = n.bit_length() # don't use (n-1) here because n can be 1
r = getrandbits(k) # 0 <= r < 2**k
while r >= n:
r = getrandbits(k)
return r
so randint()
ends in the same getrandbits()
after doing a lot of other things, and that’s pretty much how it’s slower than it.
The underlying reason is the function that getrandbits
calls is written in C
, look at the source code at GitHub Python repository
You will find that getrandbits
under the hood calls urandom
from os
module, and if you look at the implementation of this function, you’ll find that it is indeed written in C.
os_urandom_impl(PyObject *module, Py_ssize_t size)
/*[clinic end generated code: output=42c5cca9d18068e9 input=4067cdb1b6776c29]*/
{
PyObject *bytes;
int result;
if (size < 0)
return PyErr_Format(PyExc_ValueError,
"negative argument not allowed");
bytes = PyBytes_FromStringAndSize(NULL, size);
if (bytes == NULL)
return NULL;
result = _PyOS_URandom(PyBytes_AS_STRING(bytes), PyBytes_GET_SIZE(bytes));
if (result == -1) {
Py_DECREF(bytes);
return NULL;
}
return bytes;
}
Talking about randint
, if you look at the source code for this function, it calls randrange
, which further calls _randbelow_with_get_randbits
.
def randrange(self, start, stop=None, step=_ONE):
"""Choose a random item from range(start, stop[, step]).
This fixes the problem with randint() which includes the
endpoint; in Python this is usually not what you want.
"""
# This code is a bit messy to make it fast for the
# common case while still doing adequate error checking.
try:
istart = _index(start)
except TypeError:
istart = int(start)
if istart != start:
_warn('randrange() will raise TypeError in the future',
DeprecationWarning, 2)
raise ValueError("non-integer arg 1 for randrange()")
_warn('non-integer arguments to randrange() have been deprecated '
'since Python 3.10 and will be removed in a subsequent '
'version',
DeprecationWarning, 2)
if stop is None:
# We don't check for "step != 1" because it hasn't been
# type checked and converted to an integer yet.
if step is not _ONE:
raise TypeError('Missing a non-None stop argument')
if istart > 0:
return self._randbelow(istart)
raise ValueError("empty range for randrange()")
# stop argument supplied.
try:
istop = _index(stop)
except TypeError:
istop = int(stop)
if istop != stop:
_warn('randrange() will raise TypeError in the future',
DeprecationWarning, 2)
raise ValueError("non-integer stop for randrange()")
_warn('non-integer arguments to randrange() have been deprecated '
'since Python 3.10 and will be removed in a subsequent '
'version',
DeprecationWarning, 2)
width = istop - istart
try:
istep = _index(step)
except TypeError:
istep = int(step)
if istep != step:
_warn('randrange() will raise TypeError in the future',
DeprecationWarning, 2)
raise ValueError("non-integer step for randrange()")
_warn('non-integer arguments to randrange() have been deprecated '
'since Python 3.10 and will be removed in a subsequent '
'version',
DeprecationWarning, 2)
# Fast path.
if istep == 1:
if width > 0:
return istart + self._randbelow(width)
raise ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width))
# Non-unit step argument supplied.
if istep > 0:
n = (width + istep - 1) // istep
elif istep < 0:
n = (width + istep + 1) // istep
else:
raise ValueError("zero step for randrange()")
if n <= 0:
raise ValueError("empty range for randrange()")
return istart + istep * self._randbelow(n)
def randint(self, a, b):
"""Return random integer in range [a, b], including both end points.
"""
return self.randrange(a, b+1)
As you can see, there are so many things going on with Try
/Except
block in randrange
function and as they are written in Python, due to all these overheads and control flow, it’s slow.
This is the code that gets executed when you call randint
:
#!/usr/bin/python3
import random
random.randint(0, 1023)
$ python3 -m trace -t s.py
zen importlib._bootstrap>(194): s.py(4): random.randint(0, 1023)
--- modulename: random, funcname: randint
random.py(339): return self.randrange(a, b+1)
--- modulename: random, funcname: randrange
random.py(301): istart = int(start)
random.py(302): if istart != start:
random.py(304): if stop is None:
random.py(310): istop = int(stop)
random.py(311): if istop != stop:
random.py(313): width = istop - istart
random.py(314): if step == 1 and width > 0:
random.py(315): return istart + self._randbelow(width)
--- modulename: random, funcname: _randbelow_with_getrandbits
random.py(241): if not n:
random.py(243): getrandbits = self.getrandbits
random.py(244): k = n.bit_length() # don't use (n-1) here because n can be 1
random.py(245): r = getrandbits(k) # 0 <= r < 2**k
random.py(246): while r >= n:
random.py(248): return r
So the answer is that there’s some extra boilerplate that ends up calling getrandbits
, which you call directly otherwise. Note also that calling _randbelow_with_getrandbits
with a power of two exhibits the worst case behavior: It will sample getrandbits(11)
until it gets a number with 10 bits, so it will discard half of the values (this is not the main reason why it is slower, but adding the rejection sampling logic to your calls to getrandbits
make them ~twice as slow).
Some of the answers say that getrandbits
end up calling os.urandom
. That is not true.
The regular getrandbits
function is implemented directly in C in _random_Random_getrandbits_impl
. The one that calls os.urandom
is from SystemRandom
.
You can check it by using the trace module:
#!/usr/bin/python3
import random
r = random.SystemRandom()
r.getrandbits(5)
random.getrandbits(5)
$ python3 -m trace -t s.py
...
<frozen importlib._bootstrap>(186): <frozen importlib._bootstrap>(187): <frozen importlib._bootstrap>(191): <frozen importlib._bootstrap>(192): <frozen importlib._bootstrap>(194): s.py(4): r = random.SystemRandom()
--- modulename: random, funcname: __init__
random.py(123): self.seed(x)
--- modulename: random, funcname: seed
random.py(800): return None
random.py(124): self.gauss_next = None
s.py(6): r.getrandbits(5)
--- modulename: random, funcname: getrandbits
random.py(786): if k < 0:
random.py(788): numbytes = (k + 7) // 8 # bits / 8 and rounded up
random.py(789): x = int.from_bytes(_urandom(numbytes), 'big')
random.py(790): return x >> (numbytes * 8 - k) # trim excess bits
s.py(8): random.getrandbits(5)