Replace just Filtered rows in Pandas DataFrame for another Dataframe
Question:
I’m working with an Dataframe that have latitude and longitude columns. I found some problems with part of this dataframe. Filtering the columns = latitude and longitude with problem i found:
The Orginal Dataframe filtered by latitude and longitude: df17
input:
df17[['latitude','longitude']].info()
output:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 204395 entries, 431458 to 635852
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 latitude 204395 non-null float64
1 longitude 204395 non-null float64
dtypes: float64(2)
memory usage: 4.7 MB
I’ve been filtered the values latitude and longitude that doesn’t make any sense for me…
input:
df17.loc[((df17['longitude']>-35)|(df17['longitude']<-71)|(df17['latitude']>5)|(df17['latitude']<-34)),['latitude','longitude']]
output:
latitude longitude
431460 -23.369520 309.935131
431461 -23.369520 309.935131
431609 -8.057838 -34.882897
431610 -8.057838 -34.882897
431620 -12.274928 -415.558205
... ... ...
635465 -7.179325 -34.900260
635527 -7.915741 -34.898170
635528 -7.915741 -34.898170
635583 -7.128831 -34.952970
635584 -7.128831 -34.952970
4935 rows × 2 columns
So I’ve been working on those 4935 rows and fixed that, creating a new DataFrame = df_latilon
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4935 entries, 0 to 4934
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Latitude 4935 non-null float64
1 Longitude 4935 non-null float64
dtypes: float64(2)
memory usage: 77.2 KB
Now I would like to replace the rows in main Data Frame df17 with those fixed (from df_latitlon), but I’m not sure how to replace only filtered rows in df_17 main data frame.
I’ve been tried in that way:
Input:
df17.loc[((df17['longitude']>-35)|(df17['longitude']<-71)|(df17['latitude']>5)|(df17['latitude']<-34)),['latitude','longitude']]=df_latilon[['latitude','longitude']]
Output:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 204395 entries, 431458 to 635852
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 latitude 199460 non-null float64
1 longitude 199460 non-null float64
dtypes: float64(2)
memory usage: 4.7 MB
But it not worked, and looks that those 4935 rows were droped from Data Frame ..
Can someone help me with that ?
Thanks so much….
Answers:
Not an answer. Cannot reproduce your problem.
My mre – I added a few rows that do not meet the condition:
import pandas as pd
import io
f = io.StringIO('''index latitude longitude
431460 -23.369520 309.935131
431461 -23.369520 309.935131
431609 -8.057838 -34.882897
431610 -8.057838 -34.882897
431620 -12.274928 -415.558205
635465 -7.179325 -34.900260
635527 -7.915741 -34.898170
999990 -7.128831 -36.952970
999991 -7.128831 -37.952970
999992 -7.128831 -38.952970
635528 -7.915741 -34.898170
635583 -7.128831 -34.952970
635584 -7.128831 -34.952970''')
df = df17 = pd.read_csv(f,sep='s+',index_col=0)
print(f'df17:n{df17}n****')
df17:
latitude longitude
index
431460 -23.369520 309.935131
431461 -23.369520 309.935131
431609 -8.057838 -34.882897
431610 -8.057838 -34.882897
431620 -12.274928 -415.558205
635465 -7.179325 -34.900260
635527 -7.915741 -34.898170
999990 -7.128831 -36.952970
999991 -7.128831 -37.952970
999992 -7.128831 -38.952970
635528 -7.915741 -34.898170
635583 -7.128831 -34.952970
635584 -7.128831 -34.952970
****
condition = ((df17['longitude']>-35)|(df17['longitude']<-71)|(df17['latitude']>5)|(df17['latitude']<-34))
print(f'condition:n{condition}n****')
condition:
index
431460 True
431461 True
431609 True
431610 True
431620 True
635465 True
635527 True
999990 False
999991 False
999992 False
635528 True
635583 True
635584 True
dtype: bool
****
df_fixed = df17.loc[condition,['latitude','longitude']] * 20
print(f'df_fixed:n{df_fixed}n****')
df_fixed:
latitude longitude
index
431460 -467.39040 6198.70262
431461 -467.39040 6198.70262
431609 -161.15676 -697.65794
431610 -161.15676 -697.65794
431620 -245.49856 -8311.16410
635465 -143.58650 -698.00520
635527 -158.31482 -697.96340
635528 -158.31482 -697.96340
635583 -142.57662 -699.05940
635584 -142.57662 -699.05940
****
df17.loc[condition,['latitude','longitude']] = df_fixed
print(f'df17:n{df17}n****')
df17:
latitude longitude
index
431460 -467.390400 6198.70262
431461 -467.390400 6198.70262
431609 -161.156760 -697.65794
431610 -161.156760 -697.65794
431620 -245.498560 -8311.16410
635465 -143.586500 -698.00520
635527 -158.314820 -697.96340
999990 -7.128831 -36.95297
999991 -7.128831 -37.95297
999992 -7.128831 -38.95297
635528 -158.314820 -697.96340
635583 -142.576620 -699.05940
635584 -142.576620 -699.05940
****
I had the same problem, and I solved it using pd.update
so first you have to rename your filtered dataframe:
Let´s name it like df_wrong.
As I´m not sure what is the index from df17 you might want to reset its index.
df17.reset_index(inplace=True)
df_wrong= df17.loc[((df17['longitude']>-35)|(df17['longitude']<-71)|
(df17['latitude']>5)|
(df17['latitude']<-34)),['latitude','longitude']]
Then you use the index of df_wrong to identify the rows you want to replace, and set that index in your corrected dataframe (df_latilon)
df_latilon.set_index(df_wrong.index(), inplace=True)
Then you just use the function update of pandas
df17.update(df_latilon)
This function will replace the values of df17 with the values of df_latilon that corresponds to the columns and index.
I’m working with an Dataframe that have latitude and longitude columns. I found some problems with part of this dataframe. Filtering the columns = latitude and longitude with problem i found:
The Orginal Dataframe filtered by latitude and longitude: df17
input:
df17[['latitude','longitude']].info()
output:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 204395 entries, 431458 to 635852
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 latitude 204395 non-null float64
1 longitude 204395 non-null float64
dtypes: float64(2)
memory usage: 4.7 MB
I’ve been filtered the values latitude and longitude that doesn’t make any sense for me…
input:
df17.loc[((df17['longitude']>-35)|(df17['longitude']<-71)|(df17['latitude']>5)|(df17['latitude']<-34)),['latitude','longitude']]
output:
latitude longitude
431460 -23.369520 309.935131
431461 -23.369520 309.935131
431609 -8.057838 -34.882897
431610 -8.057838 -34.882897
431620 -12.274928 -415.558205
... ... ...
635465 -7.179325 -34.900260
635527 -7.915741 -34.898170
635528 -7.915741 -34.898170
635583 -7.128831 -34.952970
635584 -7.128831 -34.952970
4935 rows × 2 columns
So I’ve been working on those 4935 rows and fixed that, creating a new DataFrame = df_latilon
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4935 entries, 0 to 4934
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Latitude 4935 non-null float64
1 Longitude 4935 non-null float64
dtypes: float64(2)
memory usage: 77.2 KB
Now I would like to replace the rows in main Data Frame df17 with those fixed (from df_latitlon), but I’m not sure how to replace only filtered rows in df_17 main data frame.
I’ve been tried in that way:
Input:
df17.loc[((df17['longitude']>-35)|(df17['longitude']<-71)|(df17['latitude']>5)|(df17['latitude']<-34)),['latitude','longitude']]=df_latilon[['latitude','longitude']]
Output:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 204395 entries, 431458 to 635852
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 latitude 199460 non-null float64
1 longitude 199460 non-null float64
dtypes: float64(2)
memory usage: 4.7 MB
But it not worked, and looks that those 4935 rows were droped from Data Frame ..
Can someone help me with that ?
Thanks so much….
Not an answer. Cannot reproduce your problem.
My mre – I added a few rows that do not meet the condition:
import pandas as pd
import io
f = io.StringIO('''index latitude longitude
431460 -23.369520 309.935131
431461 -23.369520 309.935131
431609 -8.057838 -34.882897
431610 -8.057838 -34.882897
431620 -12.274928 -415.558205
635465 -7.179325 -34.900260
635527 -7.915741 -34.898170
999990 -7.128831 -36.952970
999991 -7.128831 -37.952970
999992 -7.128831 -38.952970
635528 -7.915741 -34.898170
635583 -7.128831 -34.952970
635584 -7.128831 -34.952970''')
df = df17 = pd.read_csv(f,sep='s+',index_col=0)
print(f'df17:n{df17}n****')
df17:
latitude longitude
index
431460 -23.369520 309.935131
431461 -23.369520 309.935131
431609 -8.057838 -34.882897
431610 -8.057838 -34.882897
431620 -12.274928 -415.558205
635465 -7.179325 -34.900260
635527 -7.915741 -34.898170
999990 -7.128831 -36.952970
999991 -7.128831 -37.952970
999992 -7.128831 -38.952970
635528 -7.915741 -34.898170
635583 -7.128831 -34.952970
635584 -7.128831 -34.952970
****
condition = ((df17['longitude']>-35)|(df17['longitude']<-71)|(df17['latitude']>5)|(df17['latitude']<-34))
print(f'condition:n{condition}n****')
condition:
index
431460 True
431461 True
431609 True
431610 True
431620 True
635465 True
635527 True
999990 False
999991 False
999992 False
635528 True
635583 True
635584 True
dtype: bool
****
df_fixed = df17.loc[condition,['latitude','longitude']] * 20
print(f'df_fixed:n{df_fixed}n****')
df_fixed:
latitude longitude
index
431460 -467.39040 6198.70262
431461 -467.39040 6198.70262
431609 -161.15676 -697.65794
431610 -161.15676 -697.65794
431620 -245.49856 -8311.16410
635465 -143.58650 -698.00520
635527 -158.31482 -697.96340
635528 -158.31482 -697.96340
635583 -142.57662 -699.05940
635584 -142.57662 -699.05940
****
df17.loc[condition,['latitude','longitude']] = df_fixed
print(f'df17:n{df17}n****')
df17:
latitude longitude
index
431460 -467.390400 6198.70262
431461 -467.390400 6198.70262
431609 -161.156760 -697.65794
431610 -161.156760 -697.65794
431620 -245.498560 -8311.16410
635465 -143.586500 -698.00520
635527 -158.314820 -697.96340
999990 -7.128831 -36.95297
999991 -7.128831 -37.95297
999992 -7.128831 -38.95297
635528 -158.314820 -697.96340
635583 -142.576620 -699.05940
635584 -142.576620 -699.05940
****
I had the same problem, and I solved it using pd.update
so first you have to rename your filtered dataframe:
Let´s name it like df_wrong.
As I´m not sure what is the index from df17 you might want to reset its index.
df17.reset_index(inplace=True)
df_wrong= df17.loc[((df17['longitude']>-35)|(df17['longitude']<-71)|
(df17['latitude']>5)|
(df17['latitude']<-34)),['latitude','longitude']]
Then you use the index of df_wrong to identify the rows you want to replace, and set that index in your corrected dataframe (df_latilon)
df_latilon.set_index(df_wrong.index(), inplace=True)
Then you just use the function update of pandas
df17.update(df_latilon)
This function will replace the values of df17 with the values of df_latilon that corresponds to the columns and index.