Performing calculations on DataFrames of different lengths
Question:
I have two different DataFrames that look something like this:
Lat
Lon
28.13
-87.62
28.12
-87.65
……
……
Calculated_Dist_m
34.5
101.7
…………..
The first DataFrame (name=df
) (consisting of the Lat
and Lon
columns) has just over 1000 rows (values) in it. The second DataFrame (name=new_calc_dist
) (consisting of the Calculated_Dist_m
column) has over 30000 rows (values) in it. I want to determine the new longitude and latitude coordinates using the Lat
, Lon
, and Calculated_Dist_m
columns. Here is the code I’ve tried:
r_earth = 6371000
new_lat = df['Lat'] + (new_calc_dist['Calculated_Dist_m'] / r_earth) * (180/np.pi)
new_lon = df['Lon'] + (new_calc_dist['Calculated_Dist_m'] / r_earth) * (180/np.pi) / np.cos(df['Lat'] * np.pi/180)
When I run the code, however, it only gives me new calculations for certain index values, and gives me NaNs for the rest. I’m not entirely sure how I should go about writing the code so that new longitude and latitude points are calculated for each of over 30000 row values based on the initial 1000 longitude and latitude points. Any suggestions?
EDIT
Here would be some sample outputs. Note that these are not exact figures, but give the idea.
Lat
Lon
28.13
-87.62
28.12
-87.65
28.12
-87.63
…..
……
Calculated_Dist_m
34.5
101.7
28.6
30.8
76.5
……………..
And so the sample out put would be:
Lat
Lon
28.125
-87.625
28.15
-87.61
28.127
-87.623
28.128
-87.623
28.14
-87.615
28.115
-87.655
28.14
-87.64
28.117
-87.653
28.118
-87.653
28.15
-87.645
28.115
-87.635
28.14
-87.62
28.115
-87.613
28.117
-87.633
28.118
-87.633
……
…….
Again, these are just random outputs (I tried getting the exact calculations, but could not get it to work). But overall, this gives an idea of what would be wanted: taking the coordinates from the first dataframe and calculating new coordinates based on each of the calculated distances from the second dataframe.
Answers:
If I understood correctly and assuming df1
and df2
as input, you can perform a cross merge
to get all combinations of df1
and df2
rows, then apply your computation (here as new columns Lat2/Lon2):
df = df1.merge(df2, how='cross')
r_earth = 6371000
df['Lat2'] = df['Lat'] + (df['Calculated_Dist_m'] / r_earth) * (180/np.pi)
df['Lon2'] = df['Lon'] + (df['Calculated_Dist_m'] / r_earth) * (180/np.pi) / np.cos(df['Lat'] * np.pi/180)
output:
Lat Lon Calculated_Dist_m Lat2 Lon2
0 28.13 -87.62 34.5 28.130310 -87.619648
1 28.13 -87.62 101.7 28.130915 -87.618963
2 28.13 -87.62 28.6 28.130257 -87.619708
3 28.13 -87.62 30.8 28.130277 -87.619686
4 28.13 -87.62 76.5 28.130688 -87.619220
5 28.12 -87.65 34.5 28.120310 -87.649648
6 28.12 -87.65 101.7 28.120915 -87.648963
7 28.12 -87.65 28.6 28.120257 -87.649708
8 28.12 -87.65 30.8 28.120277 -87.649686
9 28.12 -87.65 76.5 28.120688 -87.649220
10 28.12 -87.63 34.5 28.120310 -87.629648
11 28.12 -87.63 101.7 28.120915 -87.628963
12 28.12 -87.63 28.6 28.120257 -87.629708
13 28.12 -87.63 30.8 28.120277 -87.629686
14 28.12 -87.63 76.5 28.120688 -87.629220
In case you just want the result as two 2D arrays (without repeats of the input, so also O[m*n]
in memory but 2/5 of the requirement from the result of cross-join):
r_earth = 6371000
z = 180 / np.pi * new_calc_dist['Calculated_Dist_m'].values / r_earth
lat = df['Lat'].values
lon = df['Lon'].values
new_lat = lat[:, None] + z
new_lon = lon[:, None] + z / lat[:, None]
Example:
df = pd.DataFrame([[28.13, -87.62], [28.12, -87.65]], columns=['Lat', 'Lon'])
new_calc_dist = pd.DataFrame([[34.5], [101.7], [60.0]], columns=['Calculated_Dist_m'])
# result of above
>>> new_lat
array([[28.13031027, 28.13091461, 28.13053959],
[28.12031027, 28.12091461, 28.12053959]])
>>> new_lon
array([[-87.61998897, -87.61996749, -87.61998082],
[-87.64998897, -87.64996747, -87.64998081]])
If you do want those results as DataFrame
s:
kwargs = dict(index=df.index, columns=new_calc_dist.index)
new_lat = pd.DataFrame(new_lat, **kwargs)
new_lon = pd.DataFrame(new_lon, **kwargs)
I have two different DataFrames that look something like this:
Lat | Lon |
---|---|
28.13 | -87.62 |
28.12 | -87.65 |
…… | …… |
Calculated_Dist_m |
---|
34.5 |
101.7 |
………….. |
The first DataFrame (name=df
) (consisting of the Lat
and Lon
columns) has just over 1000 rows (values) in it. The second DataFrame (name=new_calc_dist
) (consisting of the Calculated_Dist_m
column) has over 30000 rows (values) in it. I want to determine the new longitude and latitude coordinates using the Lat
, Lon
, and Calculated_Dist_m
columns. Here is the code I’ve tried:
r_earth = 6371000
new_lat = df['Lat'] + (new_calc_dist['Calculated_Dist_m'] / r_earth) * (180/np.pi)
new_lon = df['Lon'] + (new_calc_dist['Calculated_Dist_m'] / r_earth) * (180/np.pi) / np.cos(df['Lat'] * np.pi/180)
When I run the code, however, it only gives me new calculations for certain index values, and gives me NaNs for the rest. I’m not entirely sure how I should go about writing the code so that new longitude and latitude points are calculated for each of over 30000 row values based on the initial 1000 longitude and latitude points. Any suggestions?
EDIT
Here would be some sample outputs. Note that these are not exact figures, but give the idea.
Lat | Lon |
---|---|
28.13 | -87.62 |
28.12 | -87.65 |
28.12 | -87.63 |
….. | …… |
Calculated_Dist_m |
---|
34.5 |
101.7 |
28.6 |
30.8 |
76.5 |
…………….. |
And so the sample out put would be:
Lat | Lon |
---|---|
28.125 | -87.625 |
28.15 | -87.61 |
28.127 | -87.623 |
28.128 | -87.623 |
28.14 | -87.615 |
28.115 | -87.655 |
28.14 | -87.64 |
28.117 | -87.653 |
28.118 | -87.653 |
28.15 | -87.645 |
28.115 | -87.635 |
28.14 | -87.62 |
28.115 | -87.613 |
28.117 | -87.633 |
28.118 | -87.633 |
…… | ……. |
Again, these are just random outputs (I tried getting the exact calculations, but could not get it to work). But overall, this gives an idea of what would be wanted: taking the coordinates from the first dataframe and calculating new coordinates based on each of the calculated distances from the second dataframe.
If I understood correctly and assuming df1
and df2
as input, you can perform a cross merge
to get all combinations of df1
and df2
rows, then apply your computation (here as new columns Lat2/Lon2):
df = df1.merge(df2, how='cross')
r_earth = 6371000
df['Lat2'] = df['Lat'] + (df['Calculated_Dist_m'] / r_earth) * (180/np.pi)
df['Lon2'] = df['Lon'] + (df['Calculated_Dist_m'] / r_earth) * (180/np.pi) / np.cos(df['Lat'] * np.pi/180)
output:
Lat Lon Calculated_Dist_m Lat2 Lon2
0 28.13 -87.62 34.5 28.130310 -87.619648
1 28.13 -87.62 101.7 28.130915 -87.618963
2 28.13 -87.62 28.6 28.130257 -87.619708
3 28.13 -87.62 30.8 28.130277 -87.619686
4 28.13 -87.62 76.5 28.130688 -87.619220
5 28.12 -87.65 34.5 28.120310 -87.649648
6 28.12 -87.65 101.7 28.120915 -87.648963
7 28.12 -87.65 28.6 28.120257 -87.649708
8 28.12 -87.65 30.8 28.120277 -87.649686
9 28.12 -87.65 76.5 28.120688 -87.649220
10 28.12 -87.63 34.5 28.120310 -87.629648
11 28.12 -87.63 101.7 28.120915 -87.628963
12 28.12 -87.63 28.6 28.120257 -87.629708
13 28.12 -87.63 30.8 28.120277 -87.629686
14 28.12 -87.63 76.5 28.120688 -87.629220
In case you just want the result as two 2D arrays (without repeats of the input, so also O[m*n]
in memory but 2/5 of the requirement from the result of cross-join):
r_earth = 6371000
z = 180 / np.pi * new_calc_dist['Calculated_Dist_m'].values / r_earth
lat = df['Lat'].values
lon = df['Lon'].values
new_lat = lat[:, None] + z
new_lon = lon[:, None] + z / lat[:, None]
Example:
df = pd.DataFrame([[28.13, -87.62], [28.12, -87.65]], columns=['Lat', 'Lon'])
new_calc_dist = pd.DataFrame([[34.5], [101.7], [60.0]], columns=['Calculated_Dist_m'])
# result of above
>>> new_lat
array([[28.13031027, 28.13091461, 28.13053959],
[28.12031027, 28.12091461, 28.12053959]])
>>> new_lon
array([[-87.61998897, -87.61996749, -87.61998082],
[-87.64998897, -87.64996747, -87.64998081]])
If you do want those results as DataFrame
s:
kwargs = dict(index=df.index, columns=new_calc_dist.index)
new_lat = pd.DataFrame(new_lat, **kwargs)
new_lon = pd.DataFrame(new_lon, **kwargs)