Column name of maximum of each row in a dataframe

Question:

I have a dataframe and i wanted a column filled with maximum value of each row so i used this :

 df_1['Highest_Rew_patch'] = df_1.max(axis=1)

output:

    Patch_0  Patch_1  Patch_2  ...  Patch_7  exp_patch  Highest_Rew_patch
0      0.0     70.0     70.0  ...      0.0          3               70.0
1      0.0     74.0     74.0  ...      0.0          4               74.0
2      0.0     78.0     78.0  ...      0.0          4               78.0
3      0.0     82.0     82.0  ...      0.0          4               82.0
4      0.0     82.0     82.0  ...      0.0          5               82.0
5      0.0     86.0     86.0  ...      0.0          6               86.0
6      0.0     90.0     90.0  ...      0.0          6               90.0
7      0.0     94.0     94.0  ...      0.0          6               94.0
8      0.0     98.0     98.0  ...      0.0          6               98.0
9      0.0     98.0     98.0  ...      0.0          7               98.0

However i want a little different result:

Patch_0  Patch_1  Patch_2   Patch_7  exp_patch  Highest_Rew_patch
0        0.0      70.0      70.0       3            Patch_2,Patch_7...
1        0.0      74.0      74.0       4            Patch_2,Patch_7...

So instead of the highest value of the row , i want the header of that column for that particular row which has highest value(s).

Asked By: Kshtj

||

Answers:

Try this:

df['Highest_Rew_patch']=df.filter(like='Patch').apply(lambda x: ', '.join(x[x.eq(x.max())].index), axis=1)

apply on axis=1, then join all the indices of x on , which is equal to the maximum value.

OUTPUT:

   Patch_0  Patch_1  ...  exp_patch                   Highest_Rew_patch
0      0.0     70.0  ...          3  Patch_1, Patch_2, Patch_4, Patch_6
1      0.0     74.0  ...          4           Patch_1, Patch_2, Patch_6
2      0.0     78.0  ...          4           Patch_1, Patch_2, Patch_6
3      0.0     82.0  ...          4           Patch_1, Patch_2, Patch_6
4      0.0     82.0  ...          5           Patch_1, Patch_2, Patch_6
Answered By: ThePyGuy

You can select the columns starting by Patch and then just keep the column names where the value is equal to the max:

>> s = df.iloc[:, df.columns.str.startswith('Patch')].apply(
   lambda s: s.index[s.eq(s.max())].tolist(), axis=1)

>> s 

0    [Patch_1, Patch_2]
1    [Patch_1, Patch_2]
2    [Patch_1, Patch_2]
3    [Patch_1, Patch_2]
4    [Patch_1, Patch_2]
5    [Patch_1, Patch_2]
6    [Patch_1, Patch_2]
7    [Patch_1, Patch_2]
8    [Patch_1, Patch_2]
9    [Patch_1, Patch_2]

Or joining as string:

>> s = s.apply(lambda s: ','.join(s))

>> s
0    Patch_1,Patch_2
1    Patch_1,Patch_2
2    Patch_1,Patch_2
3    Patch_1,Patch_2
4    Patch_1,Patch_2
5    Patch_1,Patch_2
6    Patch_1,Patch_2
7    Patch_1,Patch_2
8    Patch_1,Patch_2
9    Patch_1,Patch_2
dtype: object

Then just assign the new column:

df['Highest_Rew_patch'] = s

Update

To select other order than the max, we can use nlargest, take advantage that it remove duplicates and then select the last element:

>> n_order = 2
>> s = df.iloc[:, df.columns.str.startswith('Patch')].apply(
      lambda s: s.index[s.eq(s.nlargest(n_order)[-1])].tolist(), axis=1)

>> s
0    [Patch_1, Patch_2]
1    [Patch_1, Patch_2]
2    [Patch_1, Patch_2]
3    [Patch_1, Patch_2]
4    [Patch_1, Patch_2]
5    [Patch_1, Patch_2]
6    [Patch_1, Patch_2]
7    [Patch_1, Patch_2]
8    [Patch_1, Patch_2]
9    [Patch_1, Patch_2]
dtype: object
Answered By: Mabel Villalba