How to replace variable with groupby using values from quantiles in python

Question:

I have a df as shown below:

>>> df.head()
group_type  value
G1          125.23
G1          107.19
G1          117.37
G1          102.68
G2          185.58
G1          82.31
G2          21.82
G2          168.21
G2          134.17
G1          71.45

I have calculated the quantile values within each group as below:

>>> lowtail = df.groupby('group_type')['value'].quantile(0.25)
>>> lowtail
group_type
G1             103.8075
G2             59.0425
Name: value, dtype: float64


>>> hightail = df.groupby('group_type')['value'].quantile(0.75)
>>> hightail
group_type
G1           123.2650
G2           172.5525
Name: value, dtype: float64

Now, I need to replace the value in df within each group_type with the calculated quantile values, lowtail and hightail based on the conditions if:

  1. df.groupby(‘group_type’)[‘value’] < the value of corresponding group_type in lowtail then replace with lowtail value of the corresponding group_type

  2. df.groupby(‘group_type’)[‘value’] > the value of corresponding group_type in hightail then replace with hightail value of the corresponding group_type

The desired output looks like:

group_type  value   new_value
G1          125.23  123.2650
G1          107.19  107.19
G1          117.37  117.37
G1          102.68  103.8075
G2          185.58  172.5525
G1          82.31   103.8075
G2          21.82   59.0425
G2          168.21  168.21    
G2          134.17  134.17
G1          71.45   103.8075

I am able to do a simple replace with fixed values

df.loc[df[value] < lowtail, [value]] = lowtail

but could not condition and replace using the groupby. Can anyone help here.

Asked By: hari

||

Answers:

Is that what you want to code ? :

grp = df.groupby('group_type')['value']
low, high = grp.quantile(0.25), grp.quantile(0.75)    

def f(x):
    if x.value < low[x.name]:
        return low[x.name]
    elif x.value > high[x.name]:
        return high[x.name]
    else:
        return x.value
df['new_value'] = df.apply(f, axis=1)
Answered By: ArrowRise
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.