Invalid start byte when using decode()

Question:

As a project, I have been experimenting with encoding, decoding, and hashing; I have been getting this common error every time, I test this program.

The error is UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf2 in position 0: invalid continuation byte on line var_hash = str(newhash.digest(), encoding = 'utf-8')

In the program, I use a file that imports users’ details.

The expected format of the hashed passwords should be: 4ac2e277734c1e9e21616d7cb9bfa777b8a765af4f48e19ebac12c4173149618

I have tried other encoding/decoding methods such as latin-1 and utf-16 which give the incorrect results

The code:

import hashlib
salt = ['1AsnOZtM41','M6IQQD4fRb','XgJbmMhlg9']
login = open('login.csv','r+')
## Login format = [USERNAME,PASSWORD,SALTNUMBER]
def membership(): ## Membership Function
    linevarc = '' 
    newusr = input('Enter your username: ')
    newpass = input('Enter your password: ')
    newhash = hashlib.sha256() ## Sets hash algorithm to sha256
    newrand = random.randint(0,2) ## Picks random salt 
    for line in login:
        newlog = line.split(',')
        if newusr.upper() == newlog[0]:
            print('This username is already taken, Please re-enter all details')
            login.seek(0)
            membership()
    login.seek(0)
    for line in login:
      save = line.split(',')
      if newusr.upper() != newlog[0]:
          linevarc = linevarc + save[0] + ',' + save[1] + ',' + save[2] + 'n'
          print(str(linevarc))
          #login.write(str(newdetails))
    login.seek(0)
    for line in login:
      if newusr.upper() != newlog[0]:
        passalter = str(newpass + salt[int(newrand)])
        newhash.update(passalter.encode('utf-8'))
        var_hash = str(newhash.digest().decode(), errors = 'ignore') ## errors occur here
        linevarc = linevarc + newusr.upper() + ',' + var_hash + ',' + str(newrand)
        print(str(linevarc))
Asked By: M S

||

Answers:

Try changing newhash.digest() to newhash.hexdigest(). With .digest() the output is binary, which is generally not utf8-decodeable. .hexdigest() returns the digest in hexidecimal.

Answered By: AbbeGijly

Came across this answer while debugging, so, in case you’re trying to obtain the hash of a file rather than of a string, you need to open it in bytes mode. See below:

# Code that obtains the sha256 of multiple files

files_location = "/path/to/big/files"

local_files = []
for f in os.listdir(files_location):
    with open(f, "rb") as f_bytes:  # <-- needs to be "rb"
        f_hash = hashlib.sha256(f_bytes.read()).hexdigest()
        local_files.append((f, f_hash))

If you open it in "r" mode, the Unicode error comes up.

Answered By: Nico Villanueva
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.