Replace just Filtered rows in Pandas DataFrame for another Dataframe

Question

I’m working with an Dataframe that have latitude and longitude columns. I found some problems with part of this dataframe. Filtering the columns = latitude and longitude with problem i found:

The Orginal Dataframe filtered by latitude and longitude: df17

input:

 df17[['latitude','longitude']].info()

output:

  <class 'pandas.core.frame.DataFrame'>
Int64Index: 204395 entries, 431458 to 635852
Data columns (total 2 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   latitude   204395 non-null  float64
 1   longitude  204395 non-null  float64
dtypes: float64(2)
memory usage: 4.7 MB

I’ve been filtered the values latitude and longitude that doesn’t make any sense for me…

input:

df17.loc[((df17['longitude']>-35)|(df17['longitude']<-71)|(df17['latitude']>5)|(df17['latitude']<-34)),['latitude','longitude']]

output:

latitude    longitude
431460  -23.369520  309.935131
431461  -23.369520  309.935131
431609  -8.057838   -34.882897
431610  -8.057838   -34.882897
431620  -12.274928  -415.558205
... ... ...
635465  -7.179325   -34.900260
635527  -7.915741   -34.898170
635528  -7.915741   -34.898170
635583  -7.128831   -34.952970
635584  -7.128831   -34.952970
4935 rows × 2 columns

So I’ve been working on those 4935 rows and fixed that, creating a new DataFrame = df_latilon

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4935 entries, 0 to 4934
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Latitude   4935 non-null   float64
 1   Longitude  4935 non-null   float64
dtypes: float64(2)
memory usage: 77.2 KB

Now I would like to replace the rows in main Data Frame df17 with those fixed (from df_latitlon), but I’m not sure how to replace only filtered rows in df_17 main data frame.

I’ve been tried in that way:

Input:

df17.loc[((df17['longitude']>-35)|(df17['longitude']<-71)|(df17['latitude']>5)|(df17['latitude']<-34)),['latitude','longitude']]=df_latilon[['latitude','longitude']]

Output:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 204395 entries, 431458 to 635852
Data columns (total 2 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   latitude   199460 non-null  float64
 1   longitude  199460 non-null  float64
dtypes: float64(2)
memory usage: 4.7 MB

But it not worked, and looks that those 4935 rows were droped from Data Frame ..

Can someone help me with that ?

Thanks so much….

Asked By: Filipe Aguiar Rodrigues

||

Source

Answer 1

Not an answer. Cannot reproduce your problem.
My mre – I added a few rows that do not meet the condition:

import pandas as pd
import io

f = io.StringIO('''index        latitude    longitude
431460  -23.369520  309.935131
431461  -23.369520  309.935131
431609  -8.057838   -34.882897
431610  -8.057838   -34.882897
431620  -12.274928  -415.558205
635465  -7.179325   -34.900260
635527  -7.915741   -34.898170
999990  -7.128831   -36.952970
999991  -7.128831   -37.952970
999992  -7.128831   -38.952970
635528  -7.915741   -34.898170
635583  -7.128831   -34.952970
635584  -7.128831   -34.952970''')

df = df17 = pd.read_csv(f,sep='s+',index_col=0)
print(f'df17:n{df17}n****')

df17:
         latitude   longitude
index                        
431460 -23.369520  309.935131
431461 -23.369520  309.935131
431609  -8.057838  -34.882897
431610  -8.057838  -34.882897
431620 -12.274928 -415.558205
635465  -7.179325  -34.900260
635527  -7.915741  -34.898170
999990  -7.128831  -36.952970
999991  -7.128831  -37.952970
999992  -7.128831  -38.952970
635528  -7.915741  -34.898170
635583  -7.128831  -34.952970
635584  -7.128831  -34.952970
****

condition = ((df17['longitude']>-35)|(df17['longitude']<-71)|(df17['latitude']>5)|(df17['latitude']<-34))
print(f'condition:n{condition}n****')

condition:
index
431460     True
431461     True
431609     True
431610     True
431620     True
635465     True
635527     True
999990    False
999991    False
999992    False
635528     True
635583     True
635584     True
dtype: bool
****

df_fixed = df17.loc[condition,['latitude','longitude']] * 20
print(f'df_fixed:n{df_fixed}n****')

df_fixed:
         latitude   longitude
index                        
431460 -467.39040  6198.70262
431461 -467.39040  6198.70262
431609 -161.15676  -697.65794
431610 -161.15676  -697.65794
431620 -245.49856 -8311.16410
635465 -143.58650  -698.00520
635527 -158.31482  -697.96340
635528 -158.31482  -697.96340
635583 -142.57662  -699.05940
635584 -142.57662  -699.05940
****

df17.loc[condition,['latitude','longitude']] = df_fixed
print(f'df17:n{df17}n****')

df17:
          latitude   longitude
index                         
431460 -467.390400  6198.70262
431461 -467.390400  6198.70262
431609 -161.156760  -697.65794
431610 -161.156760  -697.65794
431620 -245.498560 -8311.16410
635465 -143.586500  -698.00520
635527 -158.314820  -697.96340
999990   -7.128831   -36.95297
999991   -7.128831   -37.95297
999992   -7.128831   -38.95297
635528 -158.314820  -697.96340
635583 -142.576620  -699.05940
635584 -142.576620  -699.05940
****

Answered By: wwii

Answer 2

I had the same problem, and I solved it using pd.update
so first you have to rename your filtered dataframe:
Let´s name it like df_wrong.
As I´m not sure what is the index from df17 you might want to reset its index.

df17.reset_index(inplace=True)
df_wrong= df17.loc[((df17['longitude']>-35)|(df17['longitude']<-71)| 
(df17['latitude']>5)| 
(df17['latitude']<-34)),['latitude','longitude']]

Then you use the index of df_wrong to identify the rows you want to replace, and set that index in your corrected dataframe (df_latilon)

df_latilon.set_index(df_wrong.index(), inplace=True)

Then you just use the function update of pandas

df17.update(df_latilon)

This function will replace the values of df17 with the values of df_latilon that corresponds to the columns and index.

Answered By: wadebranch

Replace just Filtered rows in Pandas DataFrame for another Dataframe

Question:

Answers: