Calculate diff of a numpy array using custom function instead of subtraction

Question:

I am working with an array created from a list of geographical coordinates describing a GPS trajectory. The data is like this:

[[-51.203018 -29.996149]
 [-51.203018 -29.99625 ]
 [-51.20266  -29.996229]
 ..., 
 [-51.64315  -29.717896]
 [-51.643112 -29.717737]
 [-51.642937 -29.717709]]

I want to calculate the geographic distances between rows (with the special condition that the first element is always zero, at the starting point). This would give me either a list of distances with len(distances) == coord_array.shape[1], or maybe a third column in the same array.

It is important to note that I have already have a function that returns a distance between two points (two coordinate pairs), but I don’t know how to apply it with a single array operation instead of looping through row pairs.

Currently I am doing the below to calculate segment distances in one new column, and cumulative distances in another new column (latlonarray is already shown above and distance(p1, p2) is already defined):

    dists = [0.0]
    for n in xrange(len(lonlat)-1):
        dists.append(distance(lonlat[n+1], lonlat[n]))

    lonlatarray = numpy.array(lonlat).reshape((-1,2))
    distsarray = numpy.array(dists).reshape((-1,1))
    cumdistsarray = numpy.cumsum(distsarray).reshape((-1,1))

    print numpy.hstack((lonlatarray, distsarray, cumdistsarray))

[[   -51.203018      -29.996149        0.              0.        ]
 [   -51.203018      -29.99625         7.04461338      7.04461338]
 [   -51.20266       -29.996229       39.87928578     46.92389917]
 ..., 
 [   -51.64315       -29.717896       11.11669769  92529.72742791]
 [   -51.643112      -29.717737       11.77016407  92541.49759198]
 [   -51.642937      -29.717709       19.57670066  92561.07429263]]

My main question is: “How could I perform the distance function (which takes a pair of rows as argument) like an array operation instead of a loop?” (that is, how could I properly vectorize it)

Other on-topic questions would be:

  • If I decide to use Pandas, is ther some clever trick to accomplish this?
  • Is there a way to put scipy.spatial.distance to “work for me” using geographic distance (haversine, great-circle distance)?

Also, I would appreciate some tips if I am doing anything unnecessarily complicated.

Thank you all, very much, for your interest.

Asked By: heltonbiker

||

Answers:

It sounds like you need to have your original data lonlat represented as a pair of numpy arrays, then pass these arrays to a version of the function distance which accepts arrays.

For example, looking up the definition of haversine distance, you can fairly easily turn it into a vectorised formula as follows:

def haversine_pairwise(phi, lam):
    
    dphi = phi[1:]-phi[:-1]
    dlam = lam[1:]-lam[:-1]
    
    # r is assumed to be a known constant
    return r*(0.5*(1-cos(dphi)) + cos(phi[1:])*cos(phi[:-1])*0.5*(1-cos(dlam)))

I’m not familiar with these formulas myself, but hopefully this shows you how you can do it for whichever formula you want. You would then use cumsum as you have already done. The array slicing syntax which I have used is documented here in case it’s not clear.

Answered By: DaveP