How to delete a cells from from one column in a DataFrame with a condition on python

Question:

I am interested in finding out, in the example dataframe that I have created below column 1 which is var1, I want to remove a cell where the string inside has the letter Z. But I do not want to remove the entire entire row. How can I go about to do this, I thought I might need to use .str.replace() but I do know where to start. (A disclaimer this is tutorial question)


import pandas as pd

df = pd.DataFrame({"var1": ["AZZBBAA", "CCDDDED", "DZZZZFD", "CDEEEEFG"],
                  "var2": [1,2,4,5]})

Which gives me:

    var1      var2
0   AZZBBAA     1
1   CCDDDED     2
2   DZZZZFD     4
3   CDEEEEFG    5

My desired output is below:

    var1      var2
0               1
1   CCDDDED     2
2               4
3   CDEEEEFG    5
Asked By: thole

||

Answers:

Use boolean indexing:

df.loc[df['var1'].str.contains('Z'), 'var1'] = '' # or float('nan')

Or mask:

df['var1'] = df['var1'].mask(df['var1'].str.contains('Z'))

Output:

       var1  var2
0               1
1   CCDDDED     2
2               4
3  CDEEEEFG     5
Answered By: mozway

Series.str.replace() is also feasible in your case:

df['var1'] = df['var1'].str.replace('.*Z.*', "")

this will clear the value of var1 column if it contains Z char


       var1  var2
0               1
1   CCDDDED     2
2               4
3  CDEEEEFG     5
Answered By: RomanPerekhrest