Equivalent of "points_to_xy" in GeoPandas to generate LineStrings faster?

Question:

I have a list of lines defined by start and end points. The size is on the order of 100,000s to possibly low 1,000,000. For making a list of points I use points_from_xy in GeoPandas, which is highly optimized, but is there a similar and fast way to make LineStrings in GeoPandas/Shapely?

My current method is as follows, but I can’t think of another way that can bypass the use of an explicit loop.

[((start_x[i], start_y[i]), (end_x[i], end_y[i])) for i in range(n_pts)]
Asked By: bz13531

||

Answers:

You can use points_from_xy to build two sets of GeometryArrays, then use some sneaky geometric set operations and constructive methods to get the result. Specifically, the convex_hull of two points is a line 🙂

# setup 
import numpy as np, geopandas as gpd, shapely.geometry

N = int(1e7)
x1, x2, y1, y2 = (np.random.random(size=N) for _ in range(4))

Running the following with 10 million points finishes in a manageable amount of time:

In [3]: %%time
   ...:
   ...: points1 = gpd.points_from_xy(x1, y1)
   ...: points2 = gpd.points_from_xy(x2, y2)
   ...: lines = points1.union(points2).convex_hull
   ...:
   ...:
CPU times: user 18 s, sys: 4.93 s, total: 22.9 s
Wall time: 25 s

The result is a GeometryArray of LineString objects:

In [4]: lines
Out[4]:
<GeometryArray>
[<shapely.geometry.linestring.LineString object at 0x186e78880>,
 <shapely.geometry.linestring.LineString object at 0x186e78d60>,
 <shapely.geometry.linestring.LineString object at 0x186e78880>,
 <shapely.geometry.linestring.LineString object at 0x186e78d60>,
 <shapely.geometry.linestring.LineString object at 0x186e78880>,
 <shapely.geometry.linestring.LineString object at 0x186e78d60>,
 <shapely.geometry.linestring.LineString object at 0x186e78880>,
 <shapely.geometry.linestring.LineString object at 0x186e78d60>,
 <shapely.geometry.linestring.LineString object at 0x186e78880>,
 <shapely.geometry.linestring.LineString object at 0x186e78d60>,
 ...
 <shapely.geometry.linestring.LineString object at 0x186e79e70>,
 <shapely.geometry.linestring.LineString object at 0x186e7bac0>,
 <shapely.geometry.linestring.LineString object at 0x186e79e70>,
 <shapely.geometry.linestring.LineString object at 0x186e7bac0>,
 <shapely.geometry.linestring.LineString object at 0x186e79e70>,
 <shapely.geometry.linestring.LineString object at 0x186e7bac0>,
 <shapely.geometry.linestring.LineString object at 0x186e79e70>,
 <shapely.geometry.linestring.LineString object at 0x186e7bac0>,
 <shapely.geometry.linestring.LineString object at 0x186e79e70>,
 <shapely.geometry.linestring.LineString object at 0x186e7bac0>]
Length: 10000000, dtype: geometry

I tried this using shapely.geometry.LineString with 1/10 the points (1e6) in a list comprehension and it took 23.8 seconds. I got bored waiting for this with 1e7 points…

Answered By: Michael Delgado