Matrix of distances from csv file of lat and lon

Question:

I have a csv file with places and latitudes and longitudes. I want to create matrix based on them. I tried creating matrix using:

arr = df['latitude'].values - df['latitude'].values[:, None]
pd.concat((df['name'], pd.DataFrame(arr, columns=df['name'])), axis=1)

but it only creates matrix with latitude values and I want to calculate distance between places.So the matrix I want to get will be the matrix of distances between all of the hotels.

enter image description here

Asked By: dfouheqoijefoih

||

Answers:

- Read the CSV input file for hotel name, lat and lon,
                    placing them in a table of three columns.
- LOOP A over the hotels
   - LOOP B over the hotels, starting with next hotel after A
      - Calculate D  distance between A and B
      - Store D in matrix at column A and row B
      - Store D in matrix at column B and row A

If the hotels are scattered over a wide area, you will need to use the Haversine formula to calculate accurate distances.

Answered By: ravenspoint

Based on the answer of @ravenspoint here a simple code to calculate distance.

>>> import numpy as np
>>> import pandas as pd
>>> import geopy.distance

>>> data = {"hotels": ["1", "2", "3", "4"], "lat": [20.55697, 21.123698, 25.35487, 19.12577], "long": [17.1, 18.45893, 16.78214, 14.75498]}

>>> df = pd.DataFrame(data)
>>> df

hotels lat        long
1      20.556970  17.10000
2      21.123698  18.45893
3      25.354870  16.78214
4      19.125770  14.75498

Now lets create a matrix to map distance between hotels. The matrix should have the size (nbr of hotels x nbr of hotels).

>>> matrix = np.ones((len(df), len(df))) * -1
>>> np.fill_diagonal(matrix, 0)
>>> matrix

array([[ 0., -1., -1., -1.],
       [-1.,  0., -1., -1.],
       [-1., -1.,  0., -1.],
       [-1., -1., -1.,  0.]])

So here -1 is to avoid the calculation of the same distance twice as dist(1,2) = dist(2,1).

Next, just loop over hotels and calculate the distance. Here the geopy package is used.

>>> for i in range(len(df)):
    coords_i = df.loc[i, ["lat", "long"]].values
    for j in range(i+1, len(df)):
        coords_j = df.loc[j, ["lat", "long"]].values
        matrix[i,j] = geopy.distance.geodesic(coords_i, coords_j).km

>>> matrix

array([[  0.        , 154.73003254, 532.33605633, 292.29813424],
       [ -1.        ,   0.        , 499.00500751, 445.97821702],
       [ -1.        ,  -1.        ,   0.        , 720.69054683],
       [ -1.        ,  -1.        ,  -1.        ,   0.        ]])

Please note that the nested loop is not the best way to do the job, and the code can be enhanced.

Answered By: Khaled BENAGGOUNE
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.