Normalize within groups in Pandas

Question:

I have a set of data that has a grouping variable, a position, and a value at that position:

Sample    Position    Depth
A         1           2
A         2           3
A         3           4
B         1           1
B         2           3
B         3           2

I want to generate a new column that is an internally normalized depth as follows:

Sample    Position    Depth    NormalizedDepth
A         1           2        0
A         2           3        0.5
A         3           4        1
B         1           1        0
B         2           3        1
B         3           2        0.5

This is essentially represented by the formula NormalizedDepth = (x - min(x))/(max(x)-min(x)) such that the minimum and maximum are of the group.

I know how to do this with dplyr in R with the following:

depths %>% 
  group_by(Sample) %>%
  mutate(NormalizedDepth = 100 * (Depth - min(Depth))/(max(Depth) - min(Depth)))

I cannot figure out how to do this with pandas. I’ve tried doing grouping and applying, but none of it seems to replicate what I am looking for.

Asked By: Dylan Lawrence

||

Answers:

We have transform (do the same as mutate in R dplyr ) with ptp (thes is get the diff between the max and min )

import numpy as np

g = df.groupby('Sample').Depth
(df.Depth-g.transform('min')) / g.transform(np.ptp)
0    0.0
1    0.5
2    1.0
3    0.0
4    1.0
5    0.5
Name: Depth, dtype: float64
Answered By: BENY

Group the Data Frame by Sample Series’ values, apply an anonymous function to each value of the (split) Depth Series which performs min max normalisation, assign result to NormalizedDepth Series of df DataFrame (note unlikely to be as efficient as YOBEN_S’ answer above):

import pandas as pd    
df['NormalizedDepth'] = df.groupby('Sample').Depth.apply(lambda x: (x - min(x))/(max(x)-min(x)))
Answered By: hello_friend
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.