Changing a pandas dataframe column value according to conditions
Question:
I have a pandas dataframe that contains reviews. And for each review, I have the different words with a specific score as below:
import pandas as pd
df = pd.DataFrame({
"review_num": [1,1,1,1,1,2,2,2],
"review": ["This is the first review","This is the first review","This is the first review","This is the first review","This is the first review",
"And another one","And another one","And another one"],
"token_num":[1,2,3,4,5,1,2,3],
"token":["This","is","the","first","review","And","another","one"],
"score":[0.3,-0.6,0.5,0.4,0.2,-0.7,0.5,0.4]
})
#The initial dataframe====================================================
# review_num review token_num token score
#0 1 This is the first review 1 This 0.3
#1 1 This is the first review 2 is -0.6
#2 1 This is the first review 3 the 0.5
#3 1 This is the first review 4 first 0.4
#4 1 This is the first review 5 review 0.2
#5 2 And another one 1 And -0.7
#6 2 And another one 2 another 0.5
#7 2 And another one 3 one 0.4
I need to change each review following the rules below:
1- for each review change the world that has the biggest score
2- if the word with the biggest score contains the character "t" then replace "t" with "f"
3-if it doesn’t contain the character "t" then pass to the following word (with the most important score)
The expected result is the following dataframe:
# == the modified df ============================================================
# review_num initial_review Modified_review
#0 1 This is the first review This is fhe first review
#1 2 And another one And anofher one
Could someone help me to do this?
Thanks
Answers:
You can prefilter the rows with "t" in token, then get the row with the max score with groupby.idxmax
, finally use a list comprehension to perform the substitution and join
back to the original:
m = df['token'].str.contains('t')
idx = df[m].groupby('review_num')['score'].idxmax()
out = df.loc[idx, ['review_num', 'review']].join(
pd.DataFrame({'Modified_review': [txt.replace(w, w.replace('t', 'f'))
for w, txt in zip(df.loc[idx, 'token'],
df.loc[idx, 'review'])]
}, index=idx)
)
Output:
review_num review Modified_review
2 1 This is the first review This is fhe first review
6 2 And another one And anofher one
I have a pandas dataframe that contains reviews. And for each review, I have the different words with a specific score as below:
import pandas as pd
df = pd.DataFrame({
"review_num": [1,1,1,1,1,2,2,2],
"review": ["This is the first review","This is the first review","This is the first review","This is the first review","This is the first review",
"And another one","And another one","And another one"],
"token_num":[1,2,3,4,5,1,2,3],
"token":["This","is","the","first","review","And","another","one"],
"score":[0.3,-0.6,0.5,0.4,0.2,-0.7,0.5,0.4]
})
#The initial dataframe====================================================
# review_num review token_num token score
#0 1 This is the first review 1 This 0.3
#1 1 This is the first review 2 is -0.6
#2 1 This is the first review 3 the 0.5
#3 1 This is the first review 4 first 0.4
#4 1 This is the first review 5 review 0.2
#5 2 And another one 1 And -0.7
#6 2 And another one 2 another 0.5
#7 2 And another one 3 one 0.4
I need to change each review following the rules below:
1- for each review change the world that has the biggest score
2- if the word with the biggest score contains the character "t" then replace "t" with "f"
3-if it doesn’t contain the character "t" then pass to the following word (with the most important score)
The expected result is the following dataframe:
# == the modified df ============================================================
# review_num initial_review Modified_review
#0 1 This is the first review This is fhe first review
#1 2 And another one And anofher one
Could someone help me to do this?
Thanks
You can prefilter the rows with "t" in token, then get the row with the max score with groupby.idxmax
, finally use a list comprehension to perform the substitution and join
back to the original:
m = df['token'].str.contains('t')
idx = df[m].groupby('review_num')['score'].idxmax()
out = df.loc[idx, ['review_num', 'review']].join(
pd.DataFrame({'Modified_review': [txt.replace(w, w.replace('t', 'f'))
for w, txt in zip(df.loc[idx, 'token'],
df.loc[idx, 'review'])]
}, index=idx)
)
Output:
review_num review Modified_review
2 1 This is the first review This is fhe first review
6 2 And another one And anofher one