How to find non-ascii character in a file that Python as found?
Question:
I run this code on a file content:
try:
file_content.encode().decode('ascii')
except UnicodeDecodeError as e:
print(str(e))
And it shows me this error message:
‘ascii’ codec can’t decode byte 0xe2 in position 4568: ordinal not in range(128)
It’s not useful at all. First of all, it does not tell me line number and column number. It tells me the position, which I have no idea how can I find in VS Code editor.
Then it tells me that byte 0xe2
is there. I searched and it seems that 0xe2
is รข. However, when I search that character, I don’t have it in my file.
I’m stuck.
How can I find the errors that Python has found using .decode('ascii')
?
Answers:
you just need to get ord
of character and see which one of them lie outside of ascii range
see ascii table for more info
file = 'soimefile'
with open(file, 'r') as f:
lines = f.readlines()
for line_number, line in enuemrate(lines, 1):
for character_position, character in enumerate(line, 1):
if not (0<= ord(character) <=127):
print(f"Non ascii character is at line {line_number} at position {character_position} is {character}")
way two, using str.isascii
instead of checking the ord of character
file = 'soimefile'
with open(file, 'r') as f:
lines = f.readlines()
for line_number, line in enuemrate(lines, 1):
for character_position, character in enumerate(line, 1):
if not character.isascii():
print(f"Non ascii character is at line {line_number} at position {character_position} is {character}")
I run this code on a file content:
try:
file_content.encode().decode('ascii')
except UnicodeDecodeError as e:
print(str(e))
And it shows me this error message:
‘ascii’ codec can’t decode byte 0xe2 in position 4568: ordinal not in range(128)
It’s not useful at all. First of all, it does not tell me line number and column number. It tells me the position, which I have no idea how can I find in VS Code editor.
Then it tells me that byte 0xe2
is there. I searched and it seems that 0xe2
is รข. However, when I search that character, I don’t have it in my file.
I’m stuck.
How can I find the errors that Python has found using .decode('ascii')
?
you just need to get ord
of character and see which one of them lie outside of ascii range
see ascii table for more info
file = 'soimefile'
with open(file, 'r') as f:
lines = f.readlines()
for line_number, line in enuemrate(lines, 1):
for character_position, character in enumerate(line, 1):
if not (0<= ord(character) <=127):
print(f"Non ascii character is at line {line_number} at position {character_position} is {character}")
way two, using str.isascii
instead of checking the ord of character
file = 'soimefile'
with open(file, 'r') as f:
lines = f.readlines()
for line_number, line in enuemrate(lines, 1):
for character_position, character in enumerate(line, 1):
if not character.isascii():
print(f"Non ascii character is at line {line_number} at position {character_position} is {character}")