Rearrange values in dataframe based on condition in Pandas
Question:
I have a dataset, where when the sum of Q1 24 – Q4 24 is between the number 1 – 2.5, I would like to place the number 2 in that row under Q4 24.
Data
ID type Q1 24 Q2 24 Q3 24 Q4 24
AA hi 2.0 1.2 0.5 0.6
AA hello 0.7 2.0 0.6 0.6
AA bye 0.6 0.6 0.6 0.4
AA ok 0.3 0.4 0.2 0.2
Desired
ID type Q1 24 Q2 24 Q3 24 Q4 24
AA hi 2.0 1.2 0.5 0.6
AA hello 0.7 2.0 0.6 0.6
AA bye 0.0 0.0 0.0 2.0
AA ok 0.0 0.0 0.0 2.0
Doing
df.loc[df.iloc[:,2:].sum(axis=1)>1<2.5, ['Q1 24','Q2 24','Q3 24','Q4 24']]= 2
A SO member helped with the above script, but how would I only target that row under Q4 24. I am thinking I can utilize iloc again for this. Any suggestion is appreciated.
Answers:
You were on the right track with boolean indexing.
I would use:
df.loc[df.filter(regex=r'^Qd').sum(axis=1).between(1, 2.5), 'Q4 24'] = 2
Output:
ID type Q1 24 Q2 24 Q3 24 Q4 24
0 AA hi 2.0 1.2 0.5 0.6
1 AA hello 0.7 2.0 0.6 0.6
2 AA bye 0.6 0.6 0.6 2.0
3 AA ok 0.3 0.4 0.2 2.0
adding 2 to last column and setting all the others to 0
sel = df.filter(regex=r'^Qd')
df.loc[sel.sum(axis=1).between(1, 2.5), sel.columns] = [0]*(sel.shape[1]-1)+[2]
Output:
ID type Q1 24 Q2 24 Q3 24 Q4 24
0 AA hi 2.0 1.2 0.5 0.6
1 AA hello 0.7 2.0 0.6 0.6
2 AA bye 0.0 0.0 0.0 2.0
3 AA ok 0.0 0.0 0.0 2.0
As an alternative:
import numpy as np
df['Q4 24']=np.where((df[df.columns[2:]].sum(axis=1)<=2.5) & (df[df.columns[2:]].sum(axis=1)>=1),2,df['Q4 24'])
print(df)
'''
ID type Q1 24 Q2 24 Q3 24 Q4 24
0 AA hi 2.0 1.2 0.5 0.6
1 AA hello 0.7 2.0 0.6 0.6
2 AA bye 0.6 0.6 0.6 2.0
3 AA ok 0.3 0.4 0.2 2.0
'''
This will do what you are after including zeroing the other columns:
df.loc[df.filter(regex=r'^Qd').sum(axis=1).between(1, 2.5), ['Q1 24','Q2 24','Q3 24','Q4 24']] = 0, 0, 0, 2
ID type Q1 24 Q2 24 Q3 24 Q4 24
0 AA hi 2.0 1.2 0.5 0.6
1 AA hello 0.7 2.0 0.6 0.6
2 AA bye 0 0 0 2.0
3 AA ok 0 0 0 2.0
I have a dataset, where when the sum of Q1 24 – Q4 24 is between the number 1 – 2.5, I would like to place the number 2 in that row under Q4 24.
Data
ID type Q1 24 Q2 24 Q3 24 Q4 24
AA hi 2.0 1.2 0.5 0.6
AA hello 0.7 2.0 0.6 0.6
AA bye 0.6 0.6 0.6 0.4
AA ok 0.3 0.4 0.2 0.2
Desired
ID type Q1 24 Q2 24 Q3 24 Q4 24
AA hi 2.0 1.2 0.5 0.6
AA hello 0.7 2.0 0.6 0.6
AA bye 0.0 0.0 0.0 2.0
AA ok 0.0 0.0 0.0 2.0
Doing
df.loc[df.iloc[:,2:].sum(axis=1)>1<2.5, ['Q1 24','Q2 24','Q3 24','Q4 24']]= 2
A SO member helped with the above script, but how would I only target that row under Q4 24. I am thinking I can utilize iloc again for this. Any suggestion is appreciated.
You were on the right track with boolean indexing.
I would use:
df.loc[df.filter(regex=r'^Qd').sum(axis=1).between(1, 2.5), 'Q4 24'] = 2
Output:
ID type Q1 24 Q2 24 Q3 24 Q4 24
0 AA hi 2.0 1.2 0.5 0.6
1 AA hello 0.7 2.0 0.6 0.6
2 AA bye 0.6 0.6 0.6 2.0
3 AA ok 0.3 0.4 0.2 2.0
adding 2 to last column and setting all the others to 0
sel = df.filter(regex=r'^Qd')
df.loc[sel.sum(axis=1).between(1, 2.5), sel.columns] = [0]*(sel.shape[1]-1)+[2]
Output:
ID type Q1 24 Q2 24 Q3 24 Q4 24
0 AA hi 2.0 1.2 0.5 0.6
1 AA hello 0.7 2.0 0.6 0.6
2 AA bye 0.0 0.0 0.0 2.0
3 AA ok 0.0 0.0 0.0 2.0
As an alternative:
import numpy as np
df['Q4 24']=np.where((df[df.columns[2:]].sum(axis=1)<=2.5) & (df[df.columns[2:]].sum(axis=1)>=1),2,df['Q4 24'])
print(df)
'''
ID type Q1 24 Q2 24 Q3 24 Q4 24
0 AA hi 2.0 1.2 0.5 0.6
1 AA hello 0.7 2.0 0.6 0.6
2 AA bye 0.6 0.6 0.6 2.0
3 AA ok 0.3 0.4 0.2 2.0
'''
This will do what you are after including zeroing the other columns:
df.loc[df.filter(regex=r'^Qd').sum(axis=1).between(1, 2.5), ['Q1 24','Q2 24','Q3 24','Q4 24']] = 0, 0, 0, 2
ID type Q1 24 Q2 24 Q3 24 Q4 24
0 AA hi 2.0 1.2 0.5 0.6
1 AA hello 0.7 2.0 0.6 0.6
2 AA bye 0 0 0 2.0
3 AA ok 0 0 0 2.0