Changing a pandas dataframe column value according to conditions

Question:

I have a pandas dataframe that contains reviews. And for each review, I have the different words with a specific score as below:

import pandas as pd
df = pd.DataFrame({
    "review_num": [1,1,1,1,1,2,2,2],
    "review": ["This is the first review","This is the first review","This is the first review","This is the first review","This is the first review",
               "And another one","And another one","And another one"],
    "token_num":[1,2,3,4,5,1,2,3],
    "token":["This","is","the","first","review","And","another","one"],
    "score":[0.3,-0.6,0.5,0.4,0.2,-0.7,0.5,0.4]
})

#The initial dataframe====================================================
#   review_num                    review  token_num    token  score
#0           1  This is the first review          1     This    0.3
#1           1  This is the first review          2       is   -0.6
#2           1  This is the first review          3      the    0.5
#3           1  This is the first review          4    first    0.4
#4           1  This is the first review          5   review    0.2
#5           2           And another one          1      And   -0.7
#6           2           And another one          2  another    0.5
#7           2           And another one          3      one    0.4

I need to change each review following the rules below:
1- for each review change the world that has the biggest score
2- if the word with the biggest score contains the character "t" then replace "t" with "f"
3-if it doesn’t contain the character "t" then pass to the following word (with the most important score)

The expected result is the following dataframe:


# == the modified df ============================================================
#  review_num            initial_review                     Modified_review
#0           1    This is the first review             This is fhe first review
#1           2           And another one                     And anofher one

Could someone help me to do this?
Thanks

Asked By: SLA

||

Answers:

You can prefilter the rows with "t" in token, then get the row with the max score with groupby.idxmax, finally use a list comprehension to perform the substitution and join back to the original:

m = df['token'].str.contains('t')
idx = df[m].groupby('review_num')['score'].idxmax()

out = df.loc[idx, ['review_num', 'review']].join(
    pd.DataFrame({'Modified_review': [txt.replace(w, w.replace('t', 'f'))
                                      for w, txt in zip(df.loc[idx, 'token'],
                                                    df.loc[idx, 'review'])]
                  }, index=idx)
)

Output:

   review_num                    review           Modified_review
2           1  This is the first review  This is fhe first review
6           2           And another one           And anofher one
Answered By: mozway