Conditional calculation python
Question:
A dataframe is given, and a calculation shall be done as stated below:
DF as-is
DF to-be
Column Epsilon shall be calculated:
- Where beta = same values, the two rows belong together. (10 -> 1,1 or 20 -> 2,2)
- Column epsilon of such a pair is the sum of the 2 values (12+5=17; 13+6=19)
- Print the result (17, 19) in the row of the pair (1,1 or 2,2) where delta = yes
- Set epsilon of the row to NULL where delta = no
row 50 is unchanged, since it does not belong to any pair.
I hope i was clear 🙂 if not, please ask me. Tnx for any input!
Answers:
You can use a boolean mask. Sort values by delta
column then keep the track of duplicated beta
values.
m = df.assign(delta=df['delta'].eq('x')).sort_values('delta').duplicated('beta')
epsilon = df.groupby(['beta'])['epsilon'].transform(sum)
df.loc[~m, 'epsilon'] = epsilon
df.loc[m, 'epsilon'] = 0
Output:
>>> df
alpha beta gamma delta epsilon
0 10 1 x 17
1 20 1 x 0
2 30 2 x 19
3 40 2 x 0
4 50 3 18
Input dataframe:
data = {'alpha': [10, 20, 30, 40, 50],
'beta': [1, 1, 2, 2, 3],
'gamma': ['x', '', 'x', '', ''],
'delta': ['', 'x', '', 'x', ''],
'epsilon': [12, 5, 13, 6, 18]}
df = pd.DataFrame(data)
print(df)
# Output
alpha beta gamma delta epsilon
0 10 1 x 12
1 20 1 x 5
2 30 2 x 13
3 40 2 x 6
4 50 3 18
A dataframe is given, and a calculation shall be done as stated below:
DF as-is
DF to-be
Column Epsilon shall be calculated:
- Where beta = same values, the two rows belong together. (10 -> 1,1 or 20 -> 2,2)
- Column epsilon of such a pair is the sum of the 2 values (12+5=17; 13+6=19)
- Print the result (17, 19) in the row of the pair (1,1 or 2,2) where delta = yes
- Set epsilon of the row to NULL where delta = no
row 50 is unchanged, since it does not belong to any pair.
I hope i was clear 🙂 if not, please ask me. Tnx for any input!
You can use a boolean mask. Sort values by delta
column then keep the track of duplicated beta
values.
m = df.assign(delta=df['delta'].eq('x')).sort_values('delta').duplicated('beta')
epsilon = df.groupby(['beta'])['epsilon'].transform(sum)
df.loc[~m, 'epsilon'] = epsilon
df.loc[m, 'epsilon'] = 0
Output:
>>> df
alpha beta gamma delta epsilon
0 10 1 x 17
1 20 1 x 0
2 30 2 x 19
3 40 2 x 0
4 50 3 18
Input dataframe:
data = {'alpha': [10, 20, 30, 40, 50],
'beta': [1, 1, 2, 2, 3],
'gamma': ['x', '', 'x', '', ''],
'delta': ['', 'x', '', 'x', ''],
'epsilon': [12, 5, 13, 6, 18]}
df = pd.DataFrame(data)
print(df)
# Output
alpha beta gamma delta epsilon
0 10 1 x 12
1 20 1 x 5
2 30 2 x 13
3 40 2 x 6
4 50 3 18