# Calculate %-deviation with values from a pandas Dataframe

## Question:

I am fairly new to python and I have the following dataframe

``````    setting_id  subject_id  seconds  result_id  owner_id  average  duration_id
0            7           1        0     1680.5       2.0   24.000          1.0
1            7           1     3600     1690.5       2.0   46.000          2.0
2            7           1    10800     1700.5       2.0  101.000          4.0
3            7           2        0     1682.5       2.0   12.500          1.0
4            7           2     3600     1692.5       2.0   33.500          2.0
5            7           2    10800     1702.5       2.0   86.500          4.0
6            7           3        0     1684.5       2.0    8.500          1.0
7            7           3     3600     1694.5       2.0   15.000          2.0
8            7           3    10800     1704.5       2.0   34.000          4.0

``````

What I need to do is Calculate the deviation (%) from averages with a "seconds"-value not equal to 0 from those averages with a seconds value of zero, where the subject_id and Setting_id are the same

i.e. `setting_id ==7 & subject_id ==1` would be:

`(result/baseline)*100`

——> for 3600 seconds: `(46/24)*100 = +192%`

——> for 10800 seconds: `(101/24)*100 = +421%`

…. baseline = average-result with a seconds value of 0

…. result = average-result with a seconds value other than 0

The resulting df should look like this

``````   setting_id  subject_id  seconds  owner_id  average  deviation  duration_id
0           7           1        0         2       24          0            1
1           7           1     3600         2       46        192            2
2           7           1    10800         2      101        421            4
``````

I want to use these calculations then to plot a regression graph (with seaborn) of deviations from baseline

I have played around with this df for 2 days now and tried different forloops but I just can´t figure out the correct way.

You can use:

``````# identify rows with 0
m = df['seconds'].eq(0)
# compute the sum of rows with 0
s = (df['average'].where(m)
.groupby([df['setting_id'], df['subject_id']])
.sum()
)

# compute the deviation per group
deviation = (
df[['setting_id', 'subject_id']]
.merge(s, left_on=['setting_id', 'subject_id'], right_index=True, how='left')
['average']
.rdiv(df['average']).mul(100)
.round().astype(int) # optional
)

df['deviation'] = deviation
# or
# out = df.assign(deviation=deviation)
``````

Output:

``````   setting_id  subject_id  seconds  result_id  owner_id  average  duration_id  deviation
0           7           1        0     1680.5       2.0     24.0          1.0          0
1           7           1     3600     1690.5       2.0     46.0          2.0        192
2           7           1    10800     1700.5       2.0    101.0          4.0        421
3           7           2        0     1682.5       2.0     12.5          1.0          0
4           7           2     3600     1692.5       2.0     33.5          2.0        268
5           7           2    10800     1702.5       2.0     86.5          4.0        692
6           7           3        0     1684.5       2.0      8.5          1.0          0
7           7           3     3600     1694.5       2.0     15.0          2.0        176
8           7           3    10800     1704.5       2.0     34.0          4.0        400
``````
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.