How to check if any of column value is updated/deleted in pandas dataframe?
Question:
I have a pandas data frame as.
_d = {'first_name':['Joe','Sha','Ram','Wes','David'],
'last_name':['Doe','Jhu','Krishna','County','John'],
'middle_name':['R.','M.','Q.','S.','I.']
}
df_A = pd.DataFrame(_d)
Here, first I change middle name of a person who’s last name is Doe as RA. as below.
df_A.loc[df_A['last_name']=='Doe','middle_name']='RA.'
So in pandas dataframe df_A an additional column is_changed is created and filled in with a value as Yes as below.
There are few more changes done as below
df_A.loc[df_A['first_name']=='David','last_name']='Curey'
df_A.loc[df_A['first_name']=='Ram','first_name']='Laxman'
Final expected output would be as below.
Answers:
Whenever you change (it seems you are doing it manually ?) then you also set the is_changed
for e.g.
df_A.loc[df_A['last_name']=='Doe','is_changed']='Yes'
If not doing manually, how are you doing it? And also do you keep a copy of the original to compare?
Lets assume you do have a original copy df_orig
. Then you could use pandas.compare
to know whether a row changed or not like:
df.loc[df.compare(df_orig,keep_shape=True).any(axis=1), "is_changed"] = "Yes"
df["is_changed"] = df["is_changed"].fillna("")
print(df)
first_name last_name middle_name is_changed
0 Joe Doe RA. Yes
1 Sha Jhu M.
2 Laxman Krishna Q. Yes
3 Wes County S.
4 David Curey I. Yes
Here is the code sample I tried.
import pandas as pd
_d = {
"first_name": ["Joe", "Sha", "Ram", "Wes", "David"],
"last_name": ["Doe", "Jhu", "Krishna", "County", "John"],
"middle_name": ["R.", "M.", "Q.", "S.", "I."],
}
df_A = pd.DataFrame(_d)
df_B = df_A.copy()
df_A.loc[df_A["last_name"] == "Doe", "middle_name"] = "RA."
df_A.loc[df_A["first_name"] == "David", "last_name"] = "Curey"
df_A.loc[df_A["first_name"] == "Ram", "first_name"] = "Laxman"
df_A["is_changed"] = (
df_A[df_B.columns]
.ne(df_B)
.apply(lambda x: any(x), axis=1)
.replace({True: "Yes", False: ""})
)
print(df_A)
I have a pandas data frame as.
_d = {'first_name':['Joe','Sha','Ram','Wes','David'],
'last_name':['Doe','Jhu','Krishna','County','John'],
'middle_name':['R.','M.','Q.','S.','I.']
}
df_A = pd.DataFrame(_d)
Here, first I change middle name of a person who’s last name is Doe as RA. as below.
df_A.loc[df_A['last_name']=='Doe','middle_name']='RA.'
So in pandas dataframe df_A an additional column is_changed is created and filled in with a value as Yes as below.
There are few more changes done as below
df_A.loc[df_A['first_name']=='David','last_name']='Curey'
df_A.loc[df_A['first_name']=='Ram','first_name']='Laxman'
Final expected output would be as below.
Whenever you change (it seems you are doing it manually ?) then you also set the is_changed
for e.g.
df_A.loc[df_A['last_name']=='Doe','is_changed']='Yes'
If not doing manually, how are you doing it? And also do you keep a copy of the original to compare?
Lets assume you do have a original copy df_orig
. Then you could use pandas.compare
to know whether a row changed or not like:
df.loc[df.compare(df_orig,keep_shape=True).any(axis=1), "is_changed"] = "Yes"
df["is_changed"] = df["is_changed"].fillna("")
print(df)
first_name last_name middle_name is_changed
0 Joe Doe RA. Yes
1 Sha Jhu M.
2 Laxman Krishna Q. Yes
3 Wes County S.
4 David Curey I. Yes
Here is the code sample I tried.
import pandas as pd
_d = {
"first_name": ["Joe", "Sha", "Ram", "Wes", "David"],
"last_name": ["Doe", "Jhu", "Krishna", "County", "John"],
"middle_name": ["R.", "M.", "Q.", "S.", "I."],
}
df_A = pd.DataFrame(_d)
df_B = df_A.copy()
df_A.loc[df_A["last_name"] == "Doe", "middle_name"] = "RA."
df_A.loc[df_A["first_name"] == "David", "last_name"] = "Curey"
df_A.loc[df_A["first_name"] == "Ram", "first_name"] = "Laxman"
df_A["is_changed"] = (
df_A[df_B.columns]
.ne(df_B)
.apply(lambda x: any(x), axis=1)
.replace({True: "Yes", False: ""})
)
print(df_A)