Import mssql spatial fields into geopandas/shapely geometry

Question:

I cannot seem to be able to directly import mssql spatial fields into geopandas. I can import normal mssql tables into pandas with Pymssql without problems, but I cannot figure out a way to import the spatial fields into shapely geometry. I know that the OGR driver for mssql should be able to handle it, but I’m not skilled enough in sql to figure this out.
This is more of an issue for lines and polygons as points can be converted to x and y coordinates from the mssql field.
Thanks!

Asked By: Dryden

||

Answers:

I figured it out by properly querying the sql database table and converting the wkt string to shapely geometry via the loads function in shapely.wkt.

I’m no programmer, so bear that in mind with the organization of the function. The function can import mssql tables with or without GIS geometry.

from pymssql import connect
from pandas import read_sql
from shapely.wkt import loads
from geopandas import GeoDataFrame

def rd_sql(server, database, table, col_names=None, where_col=None, where_val=None, geo_col=False, epsg=2193, export=False, path='save.csv'):
    """
    Imports data from MSSQL database, returns GeoDataFrame. Specific columns can be selected and specific queries within columns can be selected. Requires the pymssql package, which must be separately installed.
    Arguments:
    server -- The server name (str). e.g.: 'SQL2012PROD03'
    database -- The specific database within the server (str). e.g.: 'LowFlows'
    table -- The specific table within the database (str). e.g.: 'LowFlowSiteRestrictionDaily'
    col_names -- The column names that should be retrieved (list). e.g.: ['SiteID', 'BandNo', 'RecordNo']
    where_col -- The sql statement related to a specific column for selection (must be formated according to the example). e.g.: 'SnapshotType'
    where_val -- The WHERE query values for the where_col (list). e.g. ['value1', 'value2']
    geo_col -- Is there a geometry column in the table?
    epsg -- The coordinate system (int)
    export -- Should the data be exported
    path -- The path and csv name for the export if 'export' is True (str)
    """
    if col_names is None and where_col is None:
        stmt1 = 'SELECT * FROM ' + table
    elif where_col is None:
        stmt1 = 'SELECT ' + str(col_names).replace(''', '"')[1:-1] + ' FROM ' + table
    else:
        stmt1 = 'SELECT ' + str(col_names).replace(''', '"')[1:-1] + ' FROM ' + table + ' WHERE ' + str([where_col]).replace(''', '"')[1:-1] + ' IN (' + str(where_val)[1:-1] + ')'
    conn = connect(server, database=database)
    df = read_sql(stmt1, conn)

    ## Read in geometry if required
    if geo_col:
        geo_col_stmt = "SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME=" + "'" + table + "'" + " AND DATA_TYPE='geometry'"
        geo_col = str(read_sql(geo_col_stmt, conn).iloc[0,0])
        if where_col is None:
            stmt2 = 'SELECT ' + geo_col + '.STGeometryN(1).ToString()' + ' FROM ' + table
        else:
            stmt2 = 'SELECT ' + geo_col + '.STGeometryN(1).ToString()' + ' FROM ' + table + ' WHERE ' + str([where_col]).replace(''', '"')[1:-1] + ' IN (' + str(where_val)[1:-1] + ')'
        df2 = read_sql(stmt2, conn)
        df2.columns = ['geometry']
        geometry = [loads(x) for x in df2.geometry]
        df = GeoDataFrame(df, geometry=geometry, crs={'init' :'epsg:' + str(epsg)})

    if export:
        df.to_csv(path, index=False)

    conn.close()
    return(df)

EDIT: Made the function automatically find the geometry field if one exists.

Answered By: Dryden

Love this function and thanks to Dryden for it but the code that pulls the geometry has issues with mutlipolygon fields. If the geometry of one of the records is a multipolygon and you use the .STGeometryN(1) code you are only getting the first of potentially several polygons in the record. The geodataframe will not end up with the total geometry for that record.ID you tweak the code and remove the .STGeometryN(1) it should handle multipolygons.

I used this to pull census blockgroups I had stored in SQL Server and with a bit of tweaking (should include a database schema parameter) I got it to work but I would warn others who use it as is to be sure you know whether you have multipolygons in your data using this query in SQL first.

select geometrycolumn.STGeometryType(), 
,geometrycolumn.STNumGeometries() 
 from yourtable
 order by 1

This will tell you if you have multipolygons and how many per record.

Answered By: jport

Old question, but if anyone lands here, this is another solution.

If you make sure to return the geometries as WKBs (i.e. [geoField].STAsBinary() AS geometry) you can load them with the shapely.wkb.loads function

import pandas as pd
import geopandas as gpd
from shapely.wkb import loads
   
# define your connection, pymmssql, sqlAlchemy, pyodbc whatever
cnxn = DEFINE_CONNECTION

query = """SELECT [field1] 
               ,[geoField].STAsBinary() AS geometry  
               ,[someOtherFieldYouWant]
           FROM [database].[dbo].[table]
"""
df = pd.read_sql(query, cnxn)
gdf = gpd.GeoDataFrame(df)
gdf.loc[:,'geometry'] = gdf.loc[:,'geometry'].apply(loads)
gdf = gdf.set_crs(4326) # OR whatever your CRS is

You can get the CRS from the SRID of the geometries in the table (this assumes they are all the same, don’t know if they have to be).

query = """SELECT TOP 1 [geoField].STSrid 
            FROM [database].[dbo].[table]"""  
EPSG_AS_INT = pd.read_sql(query, cnxn).squeeze()

and then just gdf = gdf.set_crs(EPSG_AS_INT)

Answered By: Magnus Persson

If anyone still having errors and need a simplified work around.

import geopandas as gpd
import pandas as pd
import pymssql
from shapely import wkt

connection = pymssql.connect(server = 'Enter Server Name', database='Enter Database Name')
query = 'SELECT *, [geometry_column].STAsText() AS geometry FROM Table_Name'
df = pd.read_sql(query, connection)
df.geometry = df.geometry.apply(wkt.loads)
gdf = gpd.GeoDataFrame(df)
Answered By: StrikeEagle03