UnicodeDecodeError 'utf-8' codec can't decode – using python shapefile reader

Question:

I’m trying to read a shapefile

r = shapefile.Reader(filepath, encoding = "utf-8")

but when I try to get a value from the .records() object like:

 r.records()[0]

it returns to me the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 4: invalid continuation byte
Asked By: Paulo Calado

||

Answers:

That means your file is not encoded in utf-8. Try: ISO8859-1

If you are on Linux (or have git-bash on Windows) you can use the file command to find out the encoding.

Answered By: JoelFan

You can use this piece of code, to try different encodings when opening the shapefile. The code also searches for a .cpg file, which holds the encoding for a shapefile.

import os
import shapefile

# List with different encodings
encodings = ['utf-8', 'ISO8859-1']

# Try to add the encoding from the .cpg file
cpg_path = shp_path.replace('.shp', '.cpg')
if os.path.exists(cpg_path):
    with open(cpg_path) as cpg_file:
        for l in cpg_file:
            encodings.insert(0, str(l))


# Try to open the shapefile with the encodings from the list
for e in encodings:
    try:
        with shapefile.Reader(shp_path, encoding=e) as shp:
            print(f'Successfully opened the shapefile with encoding: {e}')
    except UnicodeDecodeError:
        print(f'Error when opening the shapefile with encoding: {e}')
Answered By: Helge Schneider
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.