SUM specific column values that have integers where row meets a condition and place in new row
Question:
I wish to SUM the first two column values that have numbers where row == ‘BB’ and place in new row below
Data
ID Q121 Q221 Q321 Q421
AA 8.0 4.8 3.1 5.3
BB 0.6 0.7 0.3 0.9
Desired
ID Q121 Q221 Q321 Q421
AA 8.0 4.8 3.1 5.3
BB 0.6 0.7 0.3 0.9
NEW 1.3 0.0 1.2 0.0
Doing
mask = df['ID'].eq('BB')
df[NEW] = df.iloc[2,1:2].sum()
I am taking the sum row 2 and columns 1 and 2
Any suggestion is appreciated.
Answers:
you can use rolling window to calculate the sum of pair of two columns, then concat the result
col=[1,0,1,0]
df2=pd.concat([df,
df.loc[df['ID'].isin(['AA','BB']) ]
.rolling(2, axis=1).sum() # calc rolling windows sum of two columns
.shift(-1, axis=1) # shift result to left
.fillna(0) # fill null values
.mul(col) # multipley to keep sum in alternate columns
]
)
df2.sort_index(axis=0, inplace=True)
#create a column with a new ID
df2['ID2']=df2['ID']+'_New'
# fill NA values with the row above
df2['ID2'].fillna(method='ffill', inplace=True)
#reset index
df2=df2.reset_index()
# where the ID is null (our new summed row), assign the ID2 to ID
df2.loc[df2['ID'].isna(), 'ID'] = df2['ID2']
#drop unwanted columns
df2.drop(columns=['ID2', 'index'], inplace=True)
df2
ID Q121 Q221 Q321 Q421
0 AA 8.0 4.8 3.1 5.3
1 AA_New 12.8 0.0 8.4 0.0
2 BB 0.6 0.7 0.3 0.9
3 BB_New 1.3 0.0 1.2 0.0```
You could create a list outside of the DF and then add it to the DataFrame
df = pd.DataFrame([['AA', 8.0, 4.8, 3.1, 5.3], ['BB', 0.6, 0.7, 0.3, 0.9]], columns=['ID', 'Q121', 'Q221', 'Q321', 'Q421']).set_index('ID')
new = []
for z in [x+y for x,y in zip(df.loc['BB'][0:-1:2],df.loc['BB'][1::2])]:
new.extend([z,0])
df.loc['New'] = new
I wish to SUM the first two column values that have numbers where row == ‘BB’ and place in new row below
Data
ID Q121 Q221 Q321 Q421
AA 8.0 4.8 3.1 5.3
BB 0.6 0.7 0.3 0.9
Desired
ID Q121 Q221 Q321 Q421
AA 8.0 4.8 3.1 5.3
BB 0.6 0.7 0.3 0.9
NEW 1.3 0.0 1.2 0.0
Doing
mask = df['ID'].eq('BB')
df[NEW] = df.iloc[2,1:2].sum()
I am taking the sum row 2 and columns 1 and 2
Any suggestion is appreciated.
you can use rolling window to calculate the sum of pair of two columns, then concat the result
col=[1,0,1,0]
df2=pd.concat([df,
df.loc[df['ID'].isin(['AA','BB']) ]
.rolling(2, axis=1).sum() # calc rolling windows sum of two columns
.shift(-1, axis=1) # shift result to left
.fillna(0) # fill null values
.mul(col) # multipley to keep sum in alternate columns
]
)
df2.sort_index(axis=0, inplace=True)
#create a column with a new ID
df2['ID2']=df2['ID']+'_New'
# fill NA values with the row above
df2['ID2'].fillna(method='ffill', inplace=True)
#reset index
df2=df2.reset_index()
# where the ID is null (our new summed row), assign the ID2 to ID
df2.loc[df2['ID'].isna(), 'ID'] = df2['ID2']
#drop unwanted columns
df2.drop(columns=['ID2', 'index'], inplace=True)
df2
ID Q121 Q221 Q321 Q421
0 AA 8.0 4.8 3.1 5.3
1 AA_New 12.8 0.0 8.4 0.0
2 BB 0.6 0.7 0.3 0.9
3 BB_New 1.3 0.0 1.2 0.0```
You could create a list outside of the DF and then add it to the DataFrame
df = pd.DataFrame([['AA', 8.0, 4.8, 3.1, 5.3], ['BB', 0.6, 0.7, 0.3, 0.9]], columns=['ID', 'Q121', 'Q221', 'Q321', 'Q421']).set_index('ID')
new = []
for z in [x+y for x,y in zip(df.loc['BB'][0:-1:2],df.loc['BB'][1::2])]:
new.extend([z,0])
df.loc['New'] = new