Calculate new column as the mean of other columns in pandas
Question:
I have a this data frame and I would like to calculate a new column as the mean of salary_1
, salary_2
and salary_3
:
df = pd.DataFrame({
'salary_1': [230, 345, 222],
'salary_2': [235, 375, 292],
'salary_3': [210, 385, 260]
})
salary_1 salary_2 salary_3
0 230 235 210
1 345 375 385
2 222 292 260
How can I do it in pandas in the most efficient way? Actually I have many more columns and I don’t want to write this one by one.
Something like this:
salary_1 salary_2 salary_3 salary_mean
0 230 235 210 (230+235+210)/3
1 345 375 385 ...
2 222 292 260 ...
Answers:
Use .mean
. By specifying the axis you can take the average across the row or the column.
df['average'] = df.mean(axis=1)
df
returns
salary_1 salary_2 salary_3 average
0 230 235 210 225.000000
1 345 375 385 368.333333
2 222 292 260 258.000000
If you only want the mean of a few you can select only those columns. E.g.
df['average_1_3'] = df[['salary_1', 'salary_3']].mean(axis=1)
df
returns
salary_1 salary_2 salary_3 average_1_3
0 230 235 210 220.0
1 345 375 385 365.0
2 222 292 260 241.0
an easy way to solve this problem is shown below :
col = df.loc[: , "salary_1":"salary_3"]
where “salary_1” is the start column name and “salary_3” is the end column name
df['salary_mean'] = col.mean(axis=1)
df
This will give you a new dataframe with a new column that shows the mean of all the other columns
This approach is really helpful when you are having a large set of columns or also helpful when you need to perform on only some selected columns not on all.
I have a this data frame and I would like to calculate a new column as the mean of salary_1
, salary_2
and salary_3
:
df = pd.DataFrame({
'salary_1': [230, 345, 222],
'salary_2': [235, 375, 292],
'salary_3': [210, 385, 260]
})
salary_1 salary_2 salary_3
0 230 235 210
1 345 375 385
2 222 292 260
How can I do it in pandas in the most efficient way? Actually I have many more columns and I don’t want to write this one by one.
Something like this:
salary_1 salary_2 salary_3 salary_mean
0 230 235 210 (230+235+210)/3
1 345 375 385 ...
2 222 292 260 ...
Use .mean
. By specifying the axis you can take the average across the row or the column.
df['average'] = df.mean(axis=1)
df
returns
salary_1 salary_2 salary_3 average
0 230 235 210 225.000000
1 345 375 385 368.333333
2 222 292 260 258.000000
If you only want the mean of a few you can select only those columns. E.g.
df['average_1_3'] = df[['salary_1', 'salary_3']].mean(axis=1)
df
returns
salary_1 salary_2 salary_3 average_1_3
0 230 235 210 220.0
1 345 375 385 365.0
2 222 292 260 241.0
an easy way to solve this problem is shown below :
col = df.loc[: , "salary_1":"salary_3"]
where “salary_1” is the start column name and “salary_3” is the end column name
df['salary_mean'] = col.mean(axis=1)
df
This will give you a new dataframe with a new column that shows the mean of all the other columns
This approach is really helpful when you are having a large set of columns or also helpful when you need to perform on only some selected columns not on all.