How to generate a byte string consisting of random nonzero bytes using Python?
Question:
As part of implementing the PKCS #1 v1.5 padding scheme for RSA, I need to generate an octet string of length n consisting of pseudo-randomly generated nonzero octets.
I’m looking for the best way to do this using Python.
This is what my current implementation looks like:
def nonzero_random_bytes(n: int) -> bytes:
values = [x.to_bytes(1, "big") for x in range(1, 256)]
seq = [secrets.choice(values) for _ in range(n)]
return b"".join(seq)
I’ve looked at generating the byte string with secrets.token_bytes(n)
, filtering the result, and generating nonzero values to backfill the string. I know I can also do something like secrets.token_bytes(2 * n)
, filter, and trim the result but that doesn’t strike me as an elegant solution.
I’ve also looked into how PyCryptodome and python-pkcs1 do this but I’m thinking there must be a better way (I poked around pyca/cryptography but couldn’t find how they did it and it seems they use OpenSSL bindings – here’s where I think that’s implemented).
Disclaimer: I am aware that I shouldn’t use PKCS1 v1.5, much less be rolling out any cryptography code myself. This is purely an academic exercise. 🙂
Answers:
You didn’t define what "best" means to you. I’d go with this, which is basically a less wordy way of doing what you already did:
from secrets import randbelow
def nonzero_random_bytes(n: int) -> bytes:
return bytes(randbelow(255) + 1 for _ in range(n))
Had almost the same as Tim but thought "best" might require speed. Benchmark for n = 250
(middle of the "probably 100-400" range):
471.3 us nonzero_random_bytes_original
438.3 us nonzero_random_bytes_randbelow
4.7 us nonzero_random_bytes_2n
3.1 us nonzero_random_bytes_plus10
Code (Try it online!):
from timeit import timeit
import secrets
def nonzero_random_bytes_original(n: int) -> bytes:
values = [x.to_bytes(1, "big") for x in range(1, 256)]
seq = [secrets.choice(values) for _ in range(n)]
return b"".join(seq)
def nonzero_random_bytes_randbelow(n: int) -> bytes:
return bytes(1 + secrets.randbelow(255) for _ in range(n))
def nonzero_random_bytes_2n(n: int) -> bytes:
return secrets.token_bytes(2 * n).replace(b' ', b'')[:n]
def nonzero_random_bytes_plus10(n: int) -> bytes:
result = b''
while need := n - len(result):
result += secrets.token_bytes(need + 10).replace(b' ', b'')[:need]
return result
funcs = [
nonzero_random_bytes_original,
nonzero_random_bytes_randbelow,
nonzero_random_bytes_2n,
nonzero_random_bytes_plus10,
]
for _ in range(3):
for func in funcs:
t = timeit(lambda: func(250), number=1000)
print('%5.1f us ' % (t * 1e3), func.__name__)
print()
As part of implementing the PKCS #1 v1.5 padding scheme for RSA, I need to generate an octet string of length n consisting of pseudo-randomly generated nonzero octets.
I’m looking for the best way to do this using Python.
This is what my current implementation looks like:
def nonzero_random_bytes(n: int) -> bytes:
values = [x.to_bytes(1, "big") for x in range(1, 256)]
seq = [secrets.choice(values) for _ in range(n)]
return b"".join(seq)
I’ve looked at generating the byte string with secrets.token_bytes(n)
, filtering the result, and generating nonzero values to backfill the string. I know I can also do something like secrets.token_bytes(2 * n)
, filter, and trim the result but that doesn’t strike me as an elegant solution.
I’ve also looked into how PyCryptodome and python-pkcs1 do this but I’m thinking there must be a better way (I poked around pyca/cryptography but couldn’t find how they did it and it seems they use OpenSSL bindings – here’s where I think that’s implemented).
Disclaimer: I am aware that I shouldn’t use PKCS1 v1.5, much less be rolling out any cryptography code myself. This is purely an academic exercise. 🙂
You didn’t define what "best" means to you. I’d go with this, which is basically a less wordy way of doing what you already did:
from secrets import randbelow
def nonzero_random_bytes(n: int) -> bytes:
return bytes(randbelow(255) + 1 for _ in range(n))
Had almost the same as Tim but thought "best" might require speed. Benchmark for n = 250
(middle of the "probably 100-400" range):
471.3 us nonzero_random_bytes_original
438.3 us nonzero_random_bytes_randbelow
4.7 us nonzero_random_bytes_2n
3.1 us nonzero_random_bytes_plus10
Code (Try it online!):
from timeit import timeit
import secrets
def nonzero_random_bytes_original(n: int) -> bytes:
values = [x.to_bytes(1, "big") for x in range(1, 256)]
seq = [secrets.choice(values) for _ in range(n)]
return b"".join(seq)
def nonzero_random_bytes_randbelow(n: int) -> bytes:
return bytes(1 + secrets.randbelow(255) for _ in range(n))
def nonzero_random_bytes_2n(n: int) -> bytes:
return secrets.token_bytes(2 * n).replace(b' ', b'')[:n]
def nonzero_random_bytes_plus10(n: int) -> bytes:
result = b''
while need := n - len(result):
result += secrets.token_bytes(need + 10).replace(b' ', b'')[:need]
return result
funcs = [
nonzero_random_bytes_original,
nonzero_random_bytes_randbelow,
nonzero_random_bytes_2n,
nonzero_random_bytes_plus10,
]
for _ in range(3):
for func in funcs:
t = timeit(lambda: func(250), number=1000)
print('%5.1f us ' % (t * 1e3), func.__name__)
print()