Rearrange values in dataframe based on condition in Pandas

Question:

I have a dataset, where when the sum of Q1 24 – Q4 24 is between the number 1 – 2.5, I would like to place the number 2 in that row under Q4 24.

Data

ID  type    Q1 24   Q2 24   Q3 24   Q4 24
AA  hi      2.0     1.2     0.5     0.6
AA  hello   0.7     2.0     0.6     0.6
AA  bye     0.6     0.6     0.6     0.4
AA  ok      0.3     0.4     0.2     0.2
                

Desired

ID  type    Q1 24   Q2 24   Q3 24   Q4 24
AA  hi      2.0     1.2     0.5     0.6
AA  hello   0.7     2.0     0.6     0.6
AA  bye     0.0     0.0     0.0     2.0
AA  ok      0.0     0.0     0.0     2.0

Doing

df.loc[df.iloc[:,2:].sum(axis=1)>1<2.5, ['Q1 24','Q2 24','Q3 24','Q4 24']]= 2

A SO member helped with the above script, but how would I only target that row under Q4 24. I am thinking I can utilize iloc again for this. Any suggestion is appreciated.

Asked By: Lynn

||

Answers:

You were on the right track with boolean indexing.

I would use:

df.loc[df.filter(regex=r'^Qd').sum(axis=1).between(1, 2.5), 'Q4 24'] = 2

Output:

   ID   type  Q1 24  Q2 24  Q3 24  Q4 24
0  AA     hi    2.0    1.2    0.5    0.6
1  AA  hello    0.7    2.0    0.6    0.6
2  AA    bye    0.6    0.6    0.6    2.0
3  AA     ok    0.3    0.4    0.2    2.0

adding 2 to last column and setting all the others to 0

sel = df.filter(regex=r'^Qd')

df.loc[sel.sum(axis=1).between(1, 2.5), sel.columns] = [0]*(sel.shape[1]-1)+[2]

Output:

   ID   type  Q1 24  Q2 24  Q3 24  Q4 24
0  AA     hi    2.0    1.2    0.5    0.6
1  AA  hello    0.7    2.0    0.6    0.6
2  AA    bye    0.0    0.0    0.0    2.0
3  AA     ok    0.0    0.0    0.0    2.0
Answered By: mozway

As an alternative:

import numpy as np
df['Q4 24']=np.where((df[df.columns[2:]].sum(axis=1)<=2.5) & (df[df.columns[2:]].sum(axis=1)>=1),2,df['Q4 24'])
print(df)
'''
   ID   type  Q1 24  Q2 24  Q3 24  Q4 24
0  AA     hi    2.0    1.2    0.5    0.6
1  AA  hello    0.7    2.0    0.6    0.6
2  AA    bye    0.6    0.6    0.6    2.0
3  AA     ok    0.3    0.4    0.2    2.0
'''

Answered By: Clegane

This will do what you are after including zeroing the other columns:

df.loc[df.filter(regex=r'^Qd').sum(axis=1).between(1, 2.5), ['Q1 24','Q2 24','Q3 24','Q4 24']] = 0, 0, 0, 2

   ID   type  Q1 24  Q2 24  Q3 24  Q4 24
0  AA     hi    2.0    1.2    0.5    0.6
1  AA  hello    0.7    2.0    0.6    0.6
2  AA    bye      0      0      0    2.0
3  AA     ok      0      0      0    2.0
Answered By: gtomer
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.