Applying function to multiple columns in pandas dataframe to make 1 new column
Question:
I’m trying to apply a function that:
- takes the value of each cell in a column divided by the mean of its respective column.
- Then create a column called [‘Score’] that has the sum of each cell value in a row computed from step 1.
My code so far:
import pandas as pd
df = pd.DataFrame(pd.read_excel('fruits.xlsx'))
def func(column):
out = df[column].values / df[column].mean()
return out
Im really unsure of how to execute this with pandas properly.
Answers:
Try this one will calculate exactly what you need in one single line:
df['Score'] = df.apply(lambda x: sum([x[i]/df[i].mean() for i in df.columns]),axis=1)
Can you post the output to:
df[0]
or
df.head()
This will help me answer your question properly.
you can make the output as a column in the dataframe
df["Score"] = df[<col_name>] / df[<col_name>].mean()
and you can use
df["Score"] = df[<col_name>].values / df[<col_name>].mean()
I tested both and both gave me the same output in the dataframe
You can do it like this
df = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]], columns=['a', 'b', 'c'])
df['score'] = df.div(df.mean()).sum(axis=1)
Output
a b c score
0 1 2 3 1.15
1 4 5 6 3.00
2 7 8 9 4.85
I’m trying to apply a function that:
- takes the value of each cell in a column divided by the mean of its respective column.
- Then create a column called [‘Score’] that has the sum of each cell value in a row computed from step 1.
My code so far:
import pandas as pd
df = pd.DataFrame(pd.read_excel('fruits.xlsx'))
def func(column):
out = df[column].values / df[column].mean()
return out
Im really unsure of how to execute this with pandas properly.
Try this one will calculate exactly what you need in one single line:
df['Score'] = df.apply(lambda x: sum([x[i]/df[i].mean() for i in df.columns]),axis=1)
Can you post the output to:
df[0]
or
df.head()
This will help me answer your question properly.
you can make the output as a column in the dataframe
df["Score"] = df[<col_name>] / df[<col_name>].mean()
and you can use
df["Score"] = df[<col_name>].values / df[<col_name>].mean()
I tested both and both gave me the same output in the dataframe
You can do it like this
df = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]], columns=['a', 'b', 'c'])
df['score'] = df.div(df.mean()).sum(axis=1)
Output
a b c score
0 1 2 3 1.15
1 4 5 6 3.00
2 7 8 9 4.85