operation between columns according to the value it contains

Question

I have a Dataframe that look like this:

df_1:

    Phase_1  Phase_2   Phase_3
0      8       4        2
1      4       6        3
2      8       8        3
3      10      5        8
...

I’d like to add a column called "Coeff" that compute (Phase_max - Phase_min) / Phase_max

For the first row: Coeff= (Phase_1 – Phase_3)/ Phase_1 = (8-2)/8 = 0.75

Expected OUTPUT:

df_1

    Phase_1  Phase_2   Phase_3   Coeff
0      8       4        2       0.75
1      4       6        3       0.5
2      8       8        3       0.625
3      10      5        8       0.5

What is the best way to compute this without using loop? I want to apply it on large dataset.

Asked By: QQ1821

||

Source

Answer 1

here is one way to do it

# list the columns, you like to use in calculations
cols=['Phase_1', 'Phase_2', 'Phase_3']

# using max and min across the axis to calculate, for the defined columns

df['coeff']=(df[cols].max(axis=1).sub(df[cols].min(axis=1))).div(df[cols].max(axis=1))
df

little performance optimization (credit Yevhen Kuzmovych)

df['coeff']= 1 - (df[cols].min(axis=1).div(df[cols].max(axis=1)))

df

    Phase_1     Phase_2     Phase_3     coeff
0         8           4           2     0.750
1         4           6           3     0.500
2         8           8           3     0.625
3         10          5           8     0.500

Answered By: Naveed

Answer 2

As per OP specification

I only want the max or the min between Phase_1 Phase_2 and Phase_3 and not other columns

The following will do the work

df['Coeff'] = (df[['Phase_1', 'Phase_2', 'Phase_3']].max(axis = 1) - df[['Phase_1', 'Phase_2', 'Phase_3']].min(axis = 1)) / df[['Phase_1', 'Phase_2', 'Phase_3']].max(axis = 1)

[Out]:

   Phase_1  Phase_2  Phase_3  Coeff
0        8        4        2  0.750
1        4        6        3  0.500
2        8        8        3  0.625
3       10        5        8  0.500

Another alternative would be to use numpy built-in modules, as follows

df['Coeff'] = (np.max(df[['Phase_1', 'Phase_2', 'Phase_3']], axis = 1) - np.min(df[['Phase_1', 'Phase_2', 'Phase_3']], axis = 1)) / np.max(df[['Phase_1', 'Phase_2', 'Phase_3']], axis = 1)

[Out]:

   Phase_1  Phase_2  Phase_3  Coeff
0        8        4        2  0.750
1        4        6        3  0.500
2        8        8        3  0.625
3       10        5        8  0.500

Answered By: Gonçalo Peres

operation between columns according to the value it contains

Question:

Answers: