Pandas modify column values in place based on boolean array

Question:

I know how to create a new column with apply or np.where based on the values of another column, but a way of selectively changing the values of an existing column is escaping me; I suspect df.ix is involved? Am I close?

For example, here’s a simple dataframe (mine has tens of thousands of rows). I would like to change the value in the ‘flag’ column (let’s say to ‘Blue’) if the name ends with the letter ‘e’:

>>> import pandas as pd
>>> df = pd.DataFrame({'name':['Mick', 'John', 'Christine', 'Stevie', 'Lindsey'], 
        'flag':['Purple', 'Red', nan, nan, nan]})[['name', 'flag']]
>>> print df

        name    flag
0       Mick  Purple
1       John     Red
2  Christine     NaN
3     Stevie     NaN
4    Lindsey     NaN
[5 rows x 2 columns]

I can make a boolean series from my criteria:

>boolean_result = df.name.str.contains('e$')
>print boolean_result
0    False
1    False
2     True
3     True
4    False
Name: name, dtype: bool

I just need the crucial step to get the following result:

>>> print result_wanted
        name    flag
0       Mick  Purple
1       John     Red
2  Christine    Blue
3     Stevie    Blue
4    Lindsey     NaN
Asked By: prooffreader

||

Answers:

df['flag'][df.name.str.contains('e$')] = 'Blue'
Answered By: U2EF1

DataFrame.mask(cond, other=nan) does exactly things you want.

It replaces values with the value of other where the condition is True.

df['flag'].mask(boolean_result, other='blue', inplace=True)

inplace=True means to perform the operation in place on the data.

If you want to replace value on condition false, you could consider using DataFrame.where().

Answered By: Ynjxsjmh
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.