operation between columns according to the value it contains
Question:
I have a Dataframe that look like this:
df_1:
Phase_1 Phase_2 Phase_3
0 8 4 2
1 4 6 3
2 8 8 3
3 10 5 8
...
I’d like to add a column called "Coeff" that compute (Phase_max - Phase_min) / Phase_max
For the first row: Coeff= (Phase_1 – Phase_3)/ Phase_1 = (8-2)/8 = 0.75
Expected OUTPUT:
df_1
Phase_1 Phase_2 Phase_3 Coeff
0 8 4 2 0.75
1 4 6 3 0.5
2 8 8 3 0.625
3 10 5 8 0.5
What is the best way to compute this without using loop? I want to apply it on large dataset.
Answers:
here is one way to do it
# list the columns, you like to use in calculations
cols=['Phase_1', 'Phase_2', 'Phase_3']
# using max and min across the axis to calculate, for the defined columns
df['coeff']=(df[cols].max(axis=1).sub(df[cols].min(axis=1))).div(df[cols].max(axis=1))
df
little performance optimization (credit Yevhen Kuzmovych)
df['coeff']= 1 - (df[cols].min(axis=1).div(df[cols].max(axis=1)))
df
Phase_1 Phase_2 Phase_3 coeff
0 8 4 2 0.750
1 4 6 3 0.500
2 8 8 3 0.625
3 10 5 8 0.500
As per OP specification
I only want the max or the min between Phase_1 Phase_2 and Phase_3 and not other columns
The following will do the work
df['Coeff'] = (df[['Phase_1', 'Phase_2', 'Phase_3']].max(axis = 1) - df[['Phase_1', 'Phase_2', 'Phase_3']].min(axis = 1)) / df[['Phase_1', 'Phase_2', 'Phase_3']].max(axis = 1)
[Out]:
Phase_1 Phase_2 Phase_3 Coeff
0 8 4 2 0.750
1 4 6 3 0.500
2 8 8 3 0.625
3 10 5 8 0.500
Another alternative would be to use numpy built-in modules, as follows
df['Coeff'] = (np.max(df[['Phase_1', 'Phase_2', 'Phase_3']], axis = 1) - np.min(df[['Phase_1', 'Phase_2', 'Phase_3']], axis = 1)) / np.max(df[['Phase_1', 'Phase_2', 'Phase_3']], axis = 1)
[Out]:
Phase_1 Phase_2 Phase_3 Coeff
0 8 4 2 0.750
1 4 6 3 0.500
2 8 8 3 0.625
3 10 5 8 0.500
I have a Dataframe that look like this:
df_1:
Phase_1 Phase_2 Phase_3
0 8 4 2
1 4 6 3
2 8 8 3
3 10 5 8
...
I’d like to add a column called "Coeff" that compute (Phase_max - Phase_min) / Phase_max
For the first row: Coeff= (Phase_1 – Phase_3)/ Phase_1 = (8-2)/8 = 0.75
Expected OUTPUT:
df_1
Phase_1 Phase_2 Phase_3 Coeff
0 8 4 2 0.75
1 4 6 3 0.5
2 8 8 3 0.625
3 10 5 8 0.5
What is the best way to compute this without using loop? I want to apply it on large dataset.
here is one way to do it
# list the columns, you like to use in calculations
cols=['Phase_1', 'Phase_2', 'Phase_3']
# using max and min across the axis to calculate, for the defined columns
df['coeff']=(df[cols].max(axis=1).sub(df[cols].min(axis=1))).div(df[cols].max(axis=1))
df
little performance optimization (credit Yevhen Kuzmovych)
df['coeff']= 1 - (df[cols].min(axis=1).div(df[cols].max(axis=1)))
df
Phase_1 Phase_2 Phase_3 coeff
0 8 4 2 0.750
1 4 6 3 0.500
2 8 8 3 0.625
3 10 5 8 0.500
As per OP specification
I only want the max or the min between Phase_1 Phase_2 and Phase_3 and not other columns
The following will do the work
df['Coeff'] = (df[['Phase_1', 'Phase_2', 'Phase_3']].max(axis = 1) - df[['Phase_1', 'Phase_2', 'Phase_3']].min(axis = 1)) / df[['Phase_1', 'Phase_2', 'Phase_3']].max(axis = 1)
[Out]:
Phase_1 Phase_2 Phase_3 Coeff
0 8 4 2 0.750
1 4 6 3 0.500
2 8 8 3 0.625
3 10 5 8 0.500
Another alternative would be to use numpy built-in modules, as follows
df['Coeff'] = (np.max(df[['Phase_1', 'Phase_2', 'Phase_3']], axis = 1) - np.min(df[['Phase_1', 'Phase_2', 'Phase_3']], axis = 1)) / np.max(df[['Phase_1', 'Phase_2', 'Phase_3']], axis = 1)
[Out]:
Phase_1 Phase_2 Phase_3 Coeff
0 8 4 2 0.750
1 4 6 3 0.500
2 8 8 3 0.625
3 10 5 8 0.500