How to apply scaler module with groupby in pandas dataframe?
Question:
I want to scale one column in my dataframe. However, I have different groups within my dataframe. I want to scale within a group. How would I do this? Currently, my code scaled the column in relation to all rows but again, I want this done within a group.
group | price
A | 10
A | 0.1
B | 1203
B | 999
I want the scaler to apply individually for row 1-2 and 3-4 in this case. This is where my code is:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df[['price']] = scaler.fit_transform(df[['price']])
Problem with this code: It scales using the entire price column instead of doing this by group.
Please advise!
Answers:
You can do something like this, where I have used a sample function foo
(instead of scaler.fit_transform
) that modifies a pandas Series based on the data it contains:
import pandas as pd
def foo(ser):
x = ser.mean()
return ser - x
df = pd.DataFrame({
'group':['A','A','B','B'],
'price':[10,0.1,1203,999]})
print(df)
df[['price']] = foo(df[['price']])
print(df)
df[['price']] = df.groupby('group')[['price']].apply(foo)
print(df)
Input:
group price
0 A 10.0
1 A 0.1
2 B 1203.0
3 B 999.0
Output if applied to entire price
column (not what you want):
group price
0 A -543.025
1 A -552.925
2 B 649.975
3 B 445.975
Output if applied group-by-group:
group price
0 A 4.95
1 A -4.95
2 B 102.00
3 B -102.00
If you replace foo
with scaler.fit_transform
in the above code, it should do what you want.
I want to scale one column in my dataframe. However, I have different groups within my dataframe. I want to scale within a group. How would I do this? Currently, my code scaled the column in relation to all rows but again, I want this done within a group.
group | price
A | 10
A | 0.1
B | 1203
B | 999
I want the scaler to apply individually for row 1-2 and 3-4 in this case. This is where my code is:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df[['price']] = scaler.fit_transform(df[['price']])
Problem with this code: It scales using the entire price column instead of doing this by group.
Please advise!
You can do something like this, where I have used a sample function foo
(instead of scaler.fit_transform
) that modifies a pandas Series based on the data it contains:
import pandas as pd
def foo(ser):
x = ser.mean()
return ser - x
df = pd.DataFrame({
'group':['A','A','B','B'],
'price':[10,0.1,1203,999]})
print(df)
df[['price']] = foo(df[['price']])
print(df)
df[['price']] = df.groupby('group')[['price']].apply(foo)
print(df)
Input:
group price
0 A 10.0
1 A 0.1
2 B 1203.0
3 B 999.0
Output if applied to entire price
column (not what you want):
group price
0 A -543.025
1 A -552.925
2 B 649.975
3 B 445.975
Output if applied group-by-group:
group price
0 A 4.95
1 A -4.95
2 B 102.00
3 B -102.00
If you replace foo
with scaler.fit_transform
in the above code, it should do what you want.