Python – Distance matrix between geographic coordinates

Question

I have a dataframe panda with over 600 geographic coordinate points. An extract from him follows below:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from math import sin, cos, sqrt, atan2, radians

lat_long = pd.DataFrame({'LATITUDE':[-22.98, -22.97, -22.92, -22.87, -22.89], 'LONGITUDE': [-43.19, -43.39, -43.24, -43.28, -43.67]})
lat_long

To calculate the distance between two points manually, I use the code below:

lat1 = radians(lat_long['LATITUDE'][0])
lon1 = radians(lat_long['LONGITUDE'][0])
lat2 = radians(lat_long['LATITUDE'][1])
lon2 = radians(lat_long['LONGITUDE'][1])

R = 6373.0

dlon = lon2 - lon1
dlat = lat2 - lat1

a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1 - a))

distance = R * c

print("Result:", round(distance,4))

What I need to do is create a function that uses the formula above to calculate the distance from all points to all, as in an array. But I have trouble thinking about what function to do and store the distances between the points. Every help is welcome. Output example (For illustrative purposes only, if I have not been clear):

|       |point 0 | point1 | point2 |
|point0 |    0   |    2   |   3    |
|point1 |    2   |    0   |   4    |
|point2 |    3   |    4   |   0    |
        |distance|distance|distance|

Asked By: Costa.Gustavo

||

Source

Answer 1

You could use pdist to compute the pairwise distances:

import pandas as pd

import numpy as np
from math import sin, cos, sqrt, atan2, radians

from scipy.spatial.distance import pdist, squareform

lat_long = pd.DataFrame({'LATITUDE': [-22.98, -22.97, -22.92, -22.87, -22.89], 'LONGITUDE': [-43.19, -43.39, -43.24, -43.28, -43.67]})


def dist(x, y):
    """Function to compute the distance between two points x, y"""

    lat1 = radians(x[0])
    lon1 = radians(x[1])
    lat2 = radians(y[0])
    lon2 = radians(y[1])

    R = 6373.0

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = sin(dlat / 2) ** 2 + cos(lat1) * cos(lat2) * sin(dlon / 2) ** 2
    c = 2 * atan2(sqrt(a), sqrt(1 - a))

    distance = R * c

    return round(distance, 4)


distances = pdist(lat_long.values, metric=dist)

points = [f'point_{i}' for i in range(1, len(lat_long) + 1)]

result = pd.DataFrame(squareform(distances), columns=points, index=points)

print(result)

Output

         point_1  point_2  point_3  point_4  point_5
point_1   0.0000  20.5115   8.4123  15.3203  50.1784
point_2  20.5115   0.0000  16.3400  15.8341  30.0319
point_3   8.4123  16.3400   0.0000   6.9086  44.1838
point_4  15.3203  15.8341   6.9086   0.0000  40.0284
point_5  50.1784  30.0319  44.1838  40.0284   0.0000

Notice that squareform converts from a sparse matrix to a dense one, so the results are store in a numpy array.

Answered By: Dani Mesejo

Answer 2

Another possible solution is

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from math import sin, cos, sqrt, atan2, radians

lat_long = pd.DataFrame({'LATITUDE':[-22.98, -22.97, -22.92, -22.87, -22.89], 'LONGITUDE': [-43.19, -43.39, -43.24, -43.28, -43.67]})
lat_long

test = lat_long.iloc[2:,:]

def distance(city1, city2):
    lat1 = radians(city1['LATITUDE'])
    lon1 = radians(city1['LONGITUDE'])
    lat2 = radians(city2['LATITUDE'])
    lon2 = radians(city2['LONGITUDE'])

    R = 6373.0

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
    c = 2 * atan2(sqrt(a), sqrt(1 - a))

    distance = R * c

    return distance

dist = np.zeros([lat_long.shape[0],lat_long.shape[0]])
for i1, city1 in lat_long.iterrows():
    for i2, city2 in lat_long.iloc[i1+1:,:].iterrows():
        dist[i1,i2] = distance(city1, city2)

print(dist)

Output

[[ 0.         20.51149047  8.41230771 15.32026132 50.17836849]
 [ 0.          0.         16.33997119 15.83407186 30.03192954]
 [ 0.          0.          0.          6.90864606 44.18376436]
 [ 0.          0.          0.          0.         40.02842872]
 [ 0.          0.          0.          0.          0.        ]]

The lower triangle of the distance matrix is empty since that the matrix is symmetric (dist[i1,i2]==dist[i2,i1])

Answered By: GZZ

Answer 3

Using GeoPandas:

import pandas as pd
import geopandas as gpd

lat_long = pd.DataFrame({'LATITUDE':[-22.98, -22.97, -22.92, -22.87, -22.89], 'LONGITUDE': [-43.19, -43.39, -43.24, -43.28, -43.67]})

# Convert Pandas dataframe to GeoPandas dataframe
gdf = gpd.GeoDataFrame(
    lat_long,
    geometry=gpd.points_from_xy(lat_long['LONGITUDE'], lat_long['LATITUDE']),
    crs='EPSG:4326' # Or change to what's appropriate for you.
)

# Calculate distances between points
distances = []
for _, row in gdf.iterrows():
    distances.append(gdf['geometry'].distance(row['geometry'])*100)

# Create data frame of distances
distances_df = pd.DataFrame.from_records(distances)
print(distances_df)

Output:

	0	1	2	3	4
0	0.000000	20.024984	7.810250	14.212670	48.836462
1	20.024984	0.000000	15.811388	14.866069	29.120440
2	7.810250	15.811388	0.000000	6.403124	43.104524
3	14.212670	14.866069	6.403124	0.000000	39.051248
4	48.836462	29.120440	43.104524	39.051248	0.000000

Note that this output is likely different from other answers because of the Coordinate Reference System (CRS). Find the appropriate CRS for you here.

Answered By: tfad334

Python – Distance matrix between geographic coordinates

Question:

Answers: