Python code for generating valid BIP-39 mnemonic words for a bitcoin wallet not working

Question:

I am trying to generate valid BIP-39 mnemonic words for a bitcoin wallet in Python, but I am encountering an issue with the generated words being rejected by verification tools. I have followed the guidelines outlined in the BIP-39 standard, but the 24th word, which serves as a checksum of the others, is causing the mnemonic to be deemed incorrect. I have searched for solutions and checked other people’s code, but I have yet to find a solution. Can someone please help me understand what I am doing wrong and how to fix it?

At the end of the message I shall write a few examples of 24 words that are not acceptable, but are the results of the program

from hashlib import sha256
import secrets

#following the instructions here: https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki
#depending on the number of words, we take the value for ENT, and CS.a

word_number=24
size_ENT=256
size_CS=int(size_ENT/32)
with open("Bip39-wordlist.txt", "r") as wordlist_file:
    words = [word.strip() for word in wordlist_file.readlines()]

#First, an initial entropy of ENT bits is generated. 
n_bytes=int(size_ENT/8)
random_bytes = secrets.token_bytes(n_bytes)
random_bits = ''.join(['{:08b}'.format(b) for b in random_bytes])
INITIAL_ENTROPY = random_bits[:size_ENT]
assert(len(INITIAL_ENTROPY)==size_ENT)
    
encoded=INITIAL_ENTROPY.encode('utf-8')
hash=sha256(encoded).digest()
bhash=''.join(format(byte, '08b') for byte in hash)
assert(len(bhash)==256)

#the first ENT / 32 bits of its SHA256 hash
CS=bhash[:size_CS]

#This checksum is appended to the end of the initial entropy.
FINAL_ENTROPY=INITIAL_ENTROPY+CS
assert(len(FINAL_ENTROPY)==size_ENT+size_CS)

#Next, these concatenated bits are split into groups of 11 bits, 
# each encoding a number from 0-2047, serving as an index into a wordlist.
for t in range(word_number):
    
    #split into groups of 11 bits, 
    extracted_bits=FINAL_ENTROPY[11*t:11*(t+1)]
    
    # each encoding a number from 0-2047,
    word_index=int(extracted_bits,2)
    
    #serving as an index into a wordlist.
    if t==0: words_extracted=     words[word_index]
    else:    words_extracted+=' '+words[word_index]
print (words_extracted)

Output incorrect examples:

kitten oak breeze dismiss breeze reduce stem symbol trend input thunder old burden brisk level hard luggage alarm upper creek deputy desert diesel primary

wave flee narrow notable budget hamster layer potato menu security wall shove save mobile badge nephew blouse major cute park margin entry drink mask

Asked By: Pietro Speroni

||

Answers:

The problem is, that you calculate the SHA256 hash of a bytestring containing of 0 and 1 (length 256), while you should work on the bytes of the entropy (length 32) directly.

Replace your line

hash=sha256(encoded).digest()

with

hash=sha256(random_bytes).digest()

and the generated mnemonic should be correct.


Just for completeness: a python implementation, actually the reference implementation can be found here.

Answered By: M. Spiller