Remove values from columns that meet simultaneous DataFrame pandas condition
Question:
Be the following DataFrame in pandas.
country
ctry
city
cty
other
important
other_important
other_1
France
France
París
París
blue
019210
0011119
red
Spain
Spain
Madrid
Barcelona
blue
1211
0019210
blue
Germany
Spain
Barcelona
Barcelona
white
019210
1212
red
France
UK
Bourdeux
London
blue
019210
91021
red
I have to fill with NaN the information of the unimportant columns (other) in case country != ctry || city != cty
. Dataframe result:
country
ctry
city
cty
other
important
other_important
other_1
France
France
París
París
blue
019210
0011119
red
Spain
Spain
Madrid
Barcelona
NaN
1211
0019210
NaN
Germany
Spain
Barcelona
Barcelona
NaN
019210
1212
NaN
France
UK
Bourdeux
London
NaN
019210
91021
NaN
Finally I delete the country and city columns.
df = df.drop(['country', 'city'], axis=1)
ctry
cty
other
important
other_important
other_1
France
París
blue
019210
0011119
red
Spain
Barcelona
NaN
1211
0019210
NaN
Spain
Barcelona
NaN
019210
1212
NaN
UK
London
NaN
019210
91021
NaN
I would be grateful if the columns that I want to leave as NaN, could be indicated in a string vector with the name of each one. ['other', 'other_1']
Answers:
Use DataFrame.loc
with set misisng value by conditions:
cols = ['other','other_1']
df.loc[df.country.ne(df.ctry) | df.city.ne(df.cty), cols] = np.nan
df = df.drop(['country', 'city'], axis=1)
Solution with remove columns country, city
use DataFrame.pop
:
cols = ['other','other_1']
df.loc[df.pop('country').ne(df.ctry) | df.pop('city').ne(df.cty), cols] = np.nan
print (df)
ctry cty other important other_important other_1
0 France París blue 19210 11119 red
1 Spain Barcelona NaN 1211 19210 NaN
2 Spain Barcelona NaN 19210 1212 NaN
3 UK London NaN 19210 91021 NaN
# list of columns
cols=['other', 'other_1']
# use mask to make NaN when condition is met
df[cols] = df[cols].mask(df['country'].ne(df['ctry']) | df['city'].ne(df['cty']))
# drop columns
df = df.drop(['country', 'city'], axis=1)
df
ctry cty other important other_important other_1
0 France París blue 19210 11119 red
1 Spain Barcelona NaN 1211 19210 NaN
2 Spain Barcelona NaN 19210 1212 NaN
3 UK London NaN 19210 91021 fNaN
Be the following DataFrame in pandas.
country | ctry | city | cty | other | important | other_important | other_1 |
---|---|---|---|---|---|---|---|
France | France | París | París | blue | 019210 | 0011119 | red |
Spain | Spain | Madrid | Barcelona | blue | 1211 | 0019210 | blue |
Germany | Spain | Barcelona | Barcelona | white | 019210 | 1212 | red |
France | UK | Bourdeux | London | blue | 019210 | 91021 | red |
I have to fill with NaN the information of the unimportant columns (other) in case country != ctry || city != cty
. Dataframe result:
country | ctry | city | cty | other | important | other_important | other_1 |
---|---|---|---|---|---|---|---|
France | France | París | París | blue | 019210 | 0011119 | red |
Spain | Spain | Madrid | Barcelona | NaN | 1211 | 0019210 | NaN |
Germany | Spain | Barcelona | Barcelona | NaN | 019210 | 1212 | NaN |
France | UK | Bourdeux | London | NaN | 019210 | 91021 | NaN |
Finally I delete the country and city columns.
df = df.drop(['country', 'city'], axis=1)
ctry | cty | other | important | other_important | other_1 |
---|---|---|---|---|---|
France | París | blue | 019210 | 0011119 | red |
Spain | Barcelona | NaN | 1211 | 0019210 | NaN |
Spain | Barcelona | NaN | 019210 | 1212 | NaN |
UK | London | NaN | 019210 | 91021 | NaN |
I would be grateful if the columns that I want to leave as NaN, could be indicated in a string vector with the name of each one. ['other', 'other_1']
Use DataFrame.loc
with set misisng value by conditions:
cols = ['other','other_1']
df.loc[df.country.ne(df.ctry) | df.city.ne(df.cty), cols] = np.nan
df = df.drop(['country', 'city'], axis=1)
Solution with remove columns country, city
use DataFrame.pop
:
cols = ['other','other_1']
df.loc[df.pop('country').ne(df.ctry) | df.pop('city').ne(df.cty), cols] = np.nan
print (df)
ctry cty other important other_important other_1
0 France París blue 19210 11119 red
1 Spain Barcelona NaN 1211 19210 NaN
2 Spain Barcelona NaN 19210 1212 NaN
3 UK London NaN 19210 91021 NaN
# list of columns
cols=['other', 'other_1']
# use mask to make NaN when condition is met
df[cols] = df[cols].mask(df['country'].ne(df['ctry']) | df['city'].ne(df['cty']))
# drop columns
df = df.drop(['country', 'city'], axis=1)
df
ctry cty other important other_important other_1
0 France París blue 19210 11119 red
1 Spain Barcelona NaN 1211 19210 NaN
2 Spain Barcelona NaN 19210 1212 NaN
3 UK London NaN 19210 91021 fNaN