How to check if any of column value is updated/deleted in pandas dataframe?

Question:

I have a pandas data frame as.

_d = {'first_name':['Joe','Sha','Ram','Wes','David'],
      'last_name':['Doe','Jhu','Krishna','County','John'],
      'middle_name':['R.','M.','Q.','S.','I.']
    }

df_A = pd.DataFrame(_d)

enter image description here

Here, first I change middle name of a person who’s last name is Doe as RA. as below.

df_A.loc[df_A['last_name']=='Doe','middle_name']='RA.'

So in pandas dataframe df_A an additional column is_changed is created and filled in with a value as Yes as below.

enter image description here

There are few more changes done as below

df_A.loc[df_A['first_name']=='David','last_name']='Curey'
df_A.loc[df_A['first_name']=='Ram','first_name']='Laxman'

Final expected output would be as below.

enter image description here

Asked By: myamulla_ciencia

||

Answers:

Whenever you change (it seems you are doing it manually ?) then you also set the is_changed for e.g.

df_A.loc[df_A['last_name']=='Doe','is_changed']='Yes'

If not doing manually, how are you doing it? And also do you keep a copy of the original to compare?

Lets assume you do have a original copy df_orig. Then you could use pandas.compare to know whether a row changed or not like:

df.loc[df.compare(df_orig,keep_shape=True).any(axis=1), "is_changed"] = "Yes"
df["is_changed"] = df["is_changed"].fillna("")

print(df)

  first_name last_name middle_name is_changed
0        Joe       Doe         RA.        Yes
1        Sha       Jhu          M.           
2     Laxman   Krishna          Q.        Yes
3        Wes    County          S.           
4      David     Curey          I.        Yes
Answered By: SomeDude

Here is the code sample I tried.

import pandas as pd

_d = {
    "first_name": ["Joe", "Sha", "Ram", "Wes", "David"],
    "last_name": ["Doe", "Jhu", "Krishna", "County", "John"],
    "middle_name": ["R.", "M.", "Q.", "S.", "I."],
}

df_A = pd.DataFrame(_d)
df_B = df_A.copy()

df_A.loc[df_A["last_name"] == "Doe", "middle_name"] = "RA."
df_A.loc[df_A["first_name"] == "David", "last_name"] = "Curey"
df_A.loc[df_A["first_name"] == "Ram", "first_name"] = "Laxman"

df_A["is_changed"] = (
    df_A[df_B.columns]
    .ne(df_B)
    .apply(lambda x: any(x), axis=1)
    .replace({True: "Yes", False: ""})
)
print(df_A)
Answered By: code_adithya
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.