Normalize within groups in Pandas
Question:
I have a set of data that has a grouping variable, a position, and a value at that position:
Sample Position Depth
A 1 2
A 2 3
A 3 4
B 1 1
B 2 3
B 3 2
I want to generate a new column that is an internally normalized depth as follows:
Sample Position Depth NormalizedDepth
A 1 2 0
A 2 3 0.5
A 3 4 1
B 1 1 0
B 2 3 1
B 3 2 0.5
This is essentially represented by the formula NormalizedDepth = (x - min(x))/(max(x)-min(x))
such that the minimum and maximum are of the group.
I know how to do this with dplyr
in R
with the following:
depths %>%
group_by(Sample) %>%
mutate(NormalizedDepth = 100 * (Depth - min(Depth))/(max(Depth) - min(Depth)))
I cannot figure out how to do this with pandas
. I’ve tried doing grouping and applying, but none of it seems to replicate what I am looking for.
Answers:
We have transform
(do the same as mutate
in R dplyr
) with ptp
(thes is get the diff between the max and min )
import numpy as np
g = df.groupby('Sample').Depth
(df.Depth-g.transform('min')) / g.transform(np.ptp)
0 0.0
1 0.5
2 1.0
3 0.0
4 1.0
5 0.5
Name: Depth, dtype: float64
Group the Data Frame by Sample Series’ values, apply an anonymous function to each value of the (split) Depth Series which performs min max normalisation, assign result to NormalizedDepth Series of df DataFrame (note unlikely to be as efficient as YOBEN_S’ answer above):
import pandas as pd
df['NormalizedDepth'] = df.groupby('Sample').Depth.apply(lambda x: (x - min(x))/(max(x)-min(x)))
I have a set of data that has a grouping variable, a position, and a value at that position:
Sample Position Depth
A 1 2
A 2 3
A 3 4
B 1 1
B 2 3
B 3 2
I want to generate a new column that is an internally normalized depth as follows:
Sample Position Depth NormalizedDepth
A 1 2 0
A 2 3 0.5
A 3 4 1
B 1 1 0
B 2 3 1
B 3 2 0.5
This is essentially represented by the formula NormalizedDepth = (x - min(x))/(max(x)-min(x))
such that the minimum and maximum are of the group.
I know how to do this with dplyr
in R
with the following:
depths %>%
group_by(Sample) %>%
mutate(NormalizedDepth = 100 * (Depth - min(Depth))/(max(Depth) - min(Depth)))
I cannot figure out how to do this with pandas
. I’ve tried doing grouping and applying, but none of it seems to replicate what I am looking for.
We have transform
(do the same as mutate
in R dplyr
) with ptp
(thes is get the diff between the max and min )
import numpy as np
g = df.groupby('Sample').Depth
(df.Depth-g.transform('min')) / g.transform(np.ptp)
0 0.0
1 0.5
2 1.0
3 0.0
4 1.0
5 0.5
Name: Depth, dtype: float64
Group the Data Frame by Sample Series’ values, apply an anonymous function to each value of the (split) Depth Series which performs min max normalisation, assign result to NormalizedDepth Series of df DataFrame (note unlikely to be as efficient as YOBEN_S’ answer above):
import pandas as pd
df['NormalizedDepth'] = df.groupby('Sample').Depth.apply(lambda x: (x - min(x))/(max(x)-min(x)))