Pandas: replace value in col1 with value in col2 based on filter / boolean test

Question:

I’m working with a small dataframe (9K rows) that has county and city fields (=columns). Some rows have NaN values for the county but have county names in the city column.

I can find these by doing

filter1 = df['county'].isna()
filter2 = df['city'].str.contains('County')
df[ filter1 & filter2 ]

For rows that match this criteria, I want assign the city value to the county field. This stackoverflow discusses how to do this, but I can’t get any of the examples to work.

I’ve tried a variety of things:

1. df[filter1 & filter2]['county'] = df[filter1 & filter2]['city']

2. df['county'] = df.apply(lambda x: x['city'] if ( x['county'].isna() & 'County' in x['city'] ) else x['county'] , axis=1 )

3. df['county'] = np.where(df['county']=='unknown' & df['city'].str.contains('County'), df['city'], df['county']  )

I get a variety of errors:

  1. A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

Helpful feedback, but I couldn’t figure out the syntax for using .loc for this. I got as far as

df.loc[df['county'].isna() & df['city'].str.contains('County),'county'] = ???

I don’t know if the .loc function would work, but even if it does, how do I specify the correct / row city field?

  1. Type Error: TypeError: unsupported operand type(s) for &: ‘str’ and ‘str’

I thought this might be because county is a string field, so the lambda f’n objected to doing a .isna() test on it. I changed the NaNs to "unknown" but I still get the error. (I’ve seen a lot of comments about the slowness of .apply() but on a 9K table, it should be fine.)

  1. Raises no errors, but doesn’t actually change the county field!

This seems like a pretty basic operation, but I can’t figure out the correct syntax to get it to make the change. What am I doing wrong?

Asked By: Feikname

||

Answers:

You can obtain the indexes of the filter you created and then use .loc to make the desired change:

filter1 = df['county'].isna()
filter2 = df['city'].str.contains('County')
idx = df[ filter1 & filter2 ].index
df.loc[idx, "county"] = df.loc[idx, "city"]
Answered By: Marcelo Paco
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.