How to apply scaler module with groupby in pandas dataframe?

Question:

I want to scale one column in my dataframe. However, I have different groups within my dataframe. I want to scale within a group. How would I do this? Currently, my code scaled the column in relation to all rows but again, I want this done within a group.

group | price 
A     | 10
A     | 0.1
B     | 1203
B     | 999

I want the scaler to apply individually for row 1-2 and 3-4 in this case. This is where my code is:

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()    
df[['price']] = scaler.fit_transform(df[['price']])

Problem with this code: It scales using the entire price column instead of doing this by group.

Please advise!

Asked By: titu84hh

||

Answers:

You can do something like this, where I have used a sample function foo (instead of scaler.fit_transform) that modifies a pandas Series based on the data it contains:

import pandas as pd
def foo(ser):
    x = ser.mean()
    return ser - x

df = pd.DataFrame({
'group':['A','A','B','B'],
'price':[10,0.1,1203,999]})
print(df)

df[['price']] = foo(df[['price']])
print(df)

df[['price']] = df.groupby('group')[['price']].apply(foo)
print(df)

Input:

  group   price
0     A    10.0
1     A     0.1
2     B  1203.0
3     B   999.0

Output if applied to entire price column (not what you want):

  group    price
0     A -543.025
1     A -552.925
2     B  649.975
3     B  445.975

Output if applied group-by-group:

  group   price
0     A    4.95
1     A   -4.95
2     B  102.00
3     B -102.00

If you replace foo with scaler.fit_transform in the above code, it should do what you want.

Answered By: constantstranger
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.