How to delete a dataframe cell based on the value of the right cell

Question:

I have a dataframe (df) where column V01 validates column D01. If the value on column V01 is ‘N’, then value of column D01 in the same row is invalid and should be deleted.

import pandas as pd
df = pd.DataFrame({'D01':[5,551,2,4],'V01':['V','N','V','V']})
df

In this example, I would like to repace 551 for a null.
In my case I have columns from D01 to D31 and V01 to V31. How could I approach this cleaning?

I’ve tried

df = df.replace('N',None)
df = df.dropna()
df

But this replaces the whole row and some valid data.

Answers:

Use query within dataframe to filter out v01=N:

df=df[df.V01!='N']]
Answered By: kaispace30098

Input:

import pandas as pd
df = pd.DataFrame({'D01':[5,551,2,4],'V01':['V','N','V','V']})

Use pandas.Series.where:

df["D01"] = df["D01"].where(df["V01"] != 'N', None)

or use pandas.DataFrame.loc:

df.loc[df["V01"] == 'N', "D01"] = None

output:

   D01 V01
0  5.0   V
1  NaN   N
2  2.0   V
3  4.0   V
Answered By: Chrysophylaxs

Using pandas.DataFrame.loc as others have said is probably better in python due to speed, but here is a loop version.

import pandas as pd

D=[5,551,2,4]
V=['V','N','V','V']
df = pd.DataFrame({'D01':D,'V01':V})
print(df)
for i in range(len(D)):
    if df.iat[i,1]=='N':
        df.iat[i,0] = None
    else:
        pass
print(df)
Answered By: BH10001

You have to use DataFrame.drop() to delete any row based on a column value condition.
Hope it helps

Answered By: Thelei

Probably inefficient calculation-wise, but workable nonetheless would be simply looping over your D01 list with a conditional linked to the value of V01, right?

To pick up your example:

df = pd.DataFrame({'D01':[5,551,2,4],'V01':['V','N','V','V']})
for i, x in enumerate(df["D01"]):
    if df["V01"][i] == "N":
        df["D01"][i] = None
new_df = df.dropna()
new_df

This should solve your problem with minimal effort as long as you don’t have to process tons of data.

Answered By: passwortknacker