Invalid start byte when using decode()
Question:
As a project, I have been experimenting with encoding, decoding, and hashing; I have been getting this common error every time, I test this program.
The error is UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf2 in position 0: invalid continuation byte
on line var_hash = str(newhash.digest(), encoding = 'utf-8')
In the program, I use a file that imports users’ details.
The expected format of the hashed passwords should be: 4ac2e277734c1e9e21616d7cb9bfa777b8a765af4f48e19ebac12c4173149618
I have tried other encoding/decoding methods such as latin-1 and utf-16 which give the incorrect results
The code:
import hashlib
salt = ['1AsnOZtM41','M6IQQD4fRb','XgJbmMhlg9']
login = open('login.csv','r+')
## Login format = [USERNAME,PASSWORD,SALTNUMBER]
def membership(): ## Membership Function
linevarc = ''
newusr = input('Enter your username: ')
newpass = input('Enter your password: ')
newhash = hashlib.sha256() ## Sets hash algorithm to sha256
newrand = random.randint(0,2) ## Picks random salt
for line in login:
newlog = line.split(',')
if newusr.upper() == newlog[0]:
print('This username is already taken, Please re-enter all details')
login.seek(0)
membership()
login.seek(0)
for line in login:
save = line.split(',')
if newusr.upper() != newlog[0]:
linevarc = linevarc + save[0] + ',' + save[1] + ',' + save[2] + 'n'
print(str(linevarc))
#login.write(str(newdetails))
login.seek(0)
for line in login:
if newusr.upper() != newlog[0]:
passalter = str(newpass + salt[int(newrand)])
newhash.update(passalter.encode('utf-8'))
var_hash = str(newhash.digest().decode(), errors = 'ignore') ## errors occur here
linevarc = linevarc + newusr.upper() + ',' + var_hash + ',' + str(newrand)
print(str(linevarc))
Answers:
Try changing newhash.digest()
to newhash.hexdigest()
. With .digest()
the output is binary, which is generally not utf8-decodeable. .hexdigest()
returns the digest in hexidecimal.
Came across this answer while debugging, so, in case you’re trying to obtain the hash of a file rather than of a string, you need to open it in bytes mode. See below:
# Code that obtains the sha256 of multiple files
files_location = "/path/to/big/files"
local_files = []
for f in os.listdir(files_location):
with open(f, "rb") as f_bytes: # <-- needs to be "rb"
f_hash = hashlib.sha256(f_bytes.read()).hexdigest()
local_files.append((f, f_hash))
If you open it in "r" mode, the Unicode error comes up.
As a project, I have been experimenting with encoding, decoding, and hashing; I have been getting this common error every time, I test this program.
The error is UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf2 in position 0: invalid continuation byte
on line var_hash = str(newhash.digest(), encoding = 'utf-8')
In the program, I use a file that imports users’ details.
The expected format of the hashed passwords should be: 4ac2e277734c1e9e21616d7cb9bfa777b8a765af4f48e19ebac12c4173149618
I have tried other encoding/decoding methods such as latin-1 and utf-16 which give the incorrect results
The code:
import hashlib
salt = ['1AsnOZtM41','M6IQQD4fRb','XgJbmMhlg9']
login = open('login.csv','r+')
## Login format = [USERNAME,PASSWORD,SALTNUMBER]
def membership(): ## Membership Function
linevarc = ''
newusr = input('Enter your username: ')
newpass = input('Enter your password: ')
newhash = hashlib.sha256() ## Sets hash algorithm to sha256
newrand = random.randint(0,2) ## Picks random salt
for line in login:
newlog = line.split(',')
if newusr.upper() == newlog[0]:
print('This username is already taken, Please re-enter all details')
login.seek(0)
membership()
login.seek(0)
for line in login:
save = line.split(',')
if newusr.upper() != newlog[0]:
linevarc = linevarc + save[0] + ',' + save[1] + ',' + save[2] + 'n'
print(str(linevarc))
#login.write(str(newdetails))
login.seek(0)
for line in login:
if newusr.upper() != newlog[0]:
passalter = str(newpass + salt[int(newrand)])
newhash.update(passalter.encode('utf-8'))
var_hash = str(newhash.digest().decode(), errors = 'ignore') ## errors occur here
linevarc = linevarc + newusr.upper() + ',' + var_hash + ',' + str(newrand)
print(str(linevarc))
Try changing newhash.digest()
to newhash.hexdigest()
. With .digest()
the output is binary, which is generally not utf8-decodeable. .hexdigest()
returns the digest in hexidecimal.
Came across this answer while debugging, so, in case you’re trying to obtain the hash of a file rather than of a string, you need to open it in bytes mode. See below:
# Code that obtains the sha256 of multiple files
files_location = "/path/to/big/files"
local_files = []
for f in os.listdir(files_location):
with open(f, "rb") as f_bytes: # <-- needs to be "rb"
f_hash = hashlib.sha256(f_bytes.read()).hexdigest()
local_files.append((f, f_hash))
If you open it in "r" mode, the Unicode error comes up.