How to delete a dataframe cell based on the value of the right cell
Question:
I have a dataframe (df) where column V01 validates column D01. If the value on column V01 is ‘N’, then value of column D01 in the same row is invalid and should be deleted.
import pandas as pd
df = pd.DataFrame({'D01':[5,551,2,4],'V01':['V','N','V','V']})
df
In this example, I would like to repace 551 for a null.
In my case I have columns from D01 to D31 and V01 to V31. How could I approach this cleaning?
I’ve tried
df = df.replace('N',None)
df = df.dropna()
df
But this replaces the whole row and some valid data.
Answers:
Use query within dataframe to filter out v01=N:
df=df[df.V01!='N']]
Input:
import pandas as pd
df = pd.DataFrame({'D01':[5,551,2,4],'V01':['V','N','V','V']})
Use pandas.Series.where
:
df["D01"] = df["D01"].where(df["V01"] != 'N', None)
or use pandas.DataFrame.loc
:
df.loc[df["V01"] == 'N', "D01"] = None
output:
D01 V01
0 5.0 V
1 NaN N
2 2.0 V
3 4.0 V
Using pandas.DataFrame.loc
as others have said is probably better in python due to speed, but here is a loop version.
import pandas as pd
D=[5,551,2,4]
V=['V','N','V','V']
df = pd.DataFrame({'D01':D,'V01':V})
print(df)
for i in range(len(D)):
if df.iat[i,1]=='N':
df.iat[i,0] = None
else:
pass
print(df)
You have to use DataFrame.drop() to delete any row based on a column value condition.
Hope it helps
Probably inefficient calculation-wise, but workable nonetheless would be simply looping over your D01 list with a conditional linked to the value of V01, right?
To pick up your example:
df = pd.DataFrame({'D01':[5,551,2,4],'V01':['V','N','V','V']})
for i, x in enumerate(df["D01"]):
if df["V01"][i] == "N":
df["D01"][i] = None
new_df = df.dropna()
new_df
This should solve your problem with minimal effort as long as you don’t have to process tons of data.
I have a dataframe (df) where column V01 validates column D01. If the value on column V01 is ‘N’, then value of column D01 in the same row is invalid and should be deleted.
import pandas as pd
df = pd.DataFrame({'D01':[5,551,2,4],'V01':['V','N','V','V']})
df
In this example, I would like to repace 551 for a null.
In my case I have columns from D01 to D31 and V01 to V31. How could I approach this cleaning?
I’ve tried
df = df.replace('N',None)
df = df.dropna()
df
But this replaces the whole row and some valid data.
Use query within dataframe to filter out v01=N:
df=df[df.V01!='N']]
Input:
import pandas as pd
df = pd.DataFrame({'D01':[5,551,2,4],'V01':['V','N','V','V']})
Use pandas.Series.where
:
df["D01"] = df["D01"].where(df["V01"] != 'N', None)
or use pandas.DataFrame.loc
:
df.loc[df["V01"] == 'N', "D01"] = None
output:
D01 V01
0 5.0 V
1 NaN N
2 2.0 V
3 4.0 V
Using pandas.DataFrame.loc
as others have said is probably better in python due to speed, but here is a loop version.
import pandas as pd
D=[5,551,2,4]
V=['V','N','V','V']
df = pd.DataFrame({'D01':D,'V01':V})
print(df)
for i in range(len(D)):
if df.iat[i,1]=='N':
df.iat[i,0] = None
else:
pass
print(df)
You have to use DataFrame.drop() to delete any row based on a column value condition.
Hope it helps
Probably inefficient calculation-wise, but workable nonetheless would be simply looping over your D01 list with a conditional linked to the value of V01, right?
To pick up your example:
df = pd.DataFrame({'D01':[5,551,2,4],'V01':['V','N','V','V']})
for i, x in enumerate(df["D01"]):
if df["V01"][i] == "N":
df["D01"][i] = None
new_df = df.dropna()
new_df
This should solve your problem with minimal effort as long as you don’t have to process tons of data.