How do I access a shapefile using gdal?

Question:

I am trying to use the following code to access a shapefile:

import os
from osgeo import gdal
from osgeo import ogr
from osgeo import osr

shp_path = "xxxxxxxx"
if __name__=='__main__':
    ogr.RegisterAll()
    gdal.SetConfigOption("SHAPE_ENCODING", "UTF-8")
    gdal.SetConfigOption("GDAL_FILENAME_IS_UTF8", "YES") 
    oDriver = ogr.GetDriverByName("ESRI Shapefile")
    path_list= os.listdir(shp_path)
    for dir in path_list:
        if dir.endswith('.shp'):
            oDS = oDriver.Open(dir, 0)
            iLayerCount = oDS.GetLayerCount()
            out_lyr = oDS.GetLayerByIndex(0)
            print(dir, iLayerCount, out_lyr.schema.len(), out_lyr.schema[0].name)

I got one result like this:

"GBZ2012371002CZ.shp",1,1,'Item_Code'

However, when I access this shape file in QGIS, by opening its attribute table, I know it actually has a lot of fields as follows: fields in QGIS
So, I begin to doubt if I am accessing the right part of the shapefile, and question what is the relation between gdal layer and what I see in QGIS.

Environment:

  • QGIS: 3.26.3
  • PYTHON: 3.7.9 (64-bit)
  • GDAL: 3.0.4

Actually the ‘Item_Code’ field is a result of my former code. That is when I unproperly used Driver.CreateDataSource instead of Driver.Open to try to load the file. At that time I used DataSource.CreateLayer to create layer and added the field ‘Item_Code’. So, essentially I haven’t gotten the right method to access the data I want, that is to say, the data shown in the QGIS attribute table.

I tried to switch the shp_path to another folder where I never created the field ‘Item_Code’ and get the following error:

'NoneType' object has no attribute 'GetLayerCount'

It seems that the oDS can’t access the attribute table at all.

Asked By: Paul

||

Answers:

The error you get is because the file failed to open, in which case GDAL (silently) returns None by default. That happens for example when it doesn’t exist (incorrect path).

It might be more intuitive to have it crash eagerly, which you can do by calling:

from osgeo import gdal
gdal.UseExceptions()

It also seems to become the default starting from GDAL 4 and onward.

The example below shows how to retrieve the attributes as a Pandas DataFrame. Pandas isn’t required for this of course, just helpful. You can remove the Pandas part to get a list with a dictionary per feature.

from osgeo import gdal
import pandas as pd

# natural earth example data
shp_path = 'ne_10m_admin_0_countries_lakes.gpkg'

ds = gdal.OpenEx(shp_path)
lyr = ds.GetLayer(0)

df_attr = pd.DataFrame.from_records([{"FID": ft.GetFID()} | dict(ft) for ft in lyr])

enter image description here

Since GDAL 3.6 there’s an arrow interface which is orders of magnitude faster compared to the "naive" way shown above. So when performance matters that’s certainly worth exploring, details can be found at:
https://gdal.org/development/rfc/rfc86_column_oriented_api.html

edit:

The ogr error codes are:

def is_code(x): return x.startswith("OGRERR")

error_codes = {k: getattr(ogr, k) for k in filter(is_code, dir(ogr))}
{
    'OGRERR_CORRUPT_DATA': 5,
    'OGRERR_FAILURE': 6,
    'OGRERR_INVALID_HANDLE': 8,
    'OGRERR_NONE': 0,
    'OGRERR_NON_EXISTING_FEATURE': 9,
    'OGRERR_NOT_ENOUGH_DATA': 1,
    'OGRERR_NOT_ENOUGH_MEMORY': 2,
    'OGRERR_UNSUPPORTED_GEOMETRY_TYPE': 3,
    'OGRERR_UNSUPPORTED_OPERATION': 4,
    'OGRERR_UNSUPPORTED_SRS': 7,
}
Answered By: Rutger Kassies
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.