How to find non-ascii character in a file that Python as found?

Question

I run this code on a file content:

try:
    file_content.encode().decode('ascii')
except UnicodeDecodeError as e:
    print(str(e))

And it shows me this error message:

‘ascii’ codec can’t decode byte 0xe2 in position 4568: ordinal not in range(128)

It’s not useful at all. First of all, it does not tell me line number and column number. It tells me the position, which I have no idea how can I find in VS Code editor.

Then it tells me that byte 0xe2 is there. I searched and it seems that 0xe2 is â. However, when I search that character, I don’t have it in my file.

I’m stuck.

How can I find the errors that Python has found using .decode('ascii')?

Asked By: Big boy

||

Source

Answer 1

you just need to get ord of character and see which one of them lie outside of ascii range

see ascii table for more info

file = 'soimefile'
with open(file, 'r') as f:
    lines = f.readlines()
    for line_number, line in enuemrate(lines, 1):
        for character_position, character in enumerate(line, 1):
            if not (0<= ord(character) <=127):
                print(f"Non ascii character is at line {line_number} at position {character_position} is {character}")

way two, using str.isascii instead of checking the ord of character

file = 'soimefile'
with open(file, 'r') as f:
    lines = f.readlines()
    for line_number, line in enuemrate(lines, 1):
        for character_position, character in enumerate(line, 1):
            if not character.isascii():
                print(f"Non ascii character is at line {line_number} at position {character_position} is {character}")

Answered By: sahasrara62

How to find non-ascii character in a file that Python as found?

Question:

Answers: