Find highest two numbers on every row in pandas dataframe and extract the column names

Question:

I have a code with multiple columns and I would like to add two more, one for the highest number on the row, and another one for the second highest. However, instead of the number, I would like to show the column name where they are found.

Assume the following data frame:

import pandas as pd

df = pd.DataFrame({'A': [1, 5, 10], 'B': [2, 6, 11], 'C': [3, 7, 12], 'D': [4, 8, 13], 'E': [5, 9, 14]})

To extract the highest number on every row, I can just apply max(axis=1) like this:

df['max1'] = df[['A', 'B', 'C', 'D', 'E']].max(axis = 1)

This gets me the max number, but not the column name itself.

How can this be applied to the second max number as well?

Asked By: DarknessPlusPlus

||

Answers:

You can sorting values and assign top2 values:

cols = ['A', 'B', 'C', 'D', 'E']
df[['max2','max1']] = np.sort(df[cols].to_numpy(), axis=1)[:, -2:]
print (df)
    A   B   C   D   E  max2  max1
0   1   2   3   4   5     4     5
1   5   6   7   8   9     8     9
2  10  11  12  13  14    13    14

df[['max1','max2']] = np.sort(df[cols].to_numpy(), axis=1)[:, -2:][:, ::-1]

EDIT: For get top2 columns names and top2 values use:

df = pd.DataFrame({'A': [1, 50, 10], 'B': [2, 6, 11], 
                   'C': [3, 7, 12], 'D': [40, 8, 13], 'E': [5, 9, 14]})

cols = ['A', 'B', 'C', 'D', 'E']

#values in numpy array
vals = df[cols].to_numpy()

#columns names in array
cols = np.array(cols)

#get indices that would sort an array in descending order
arr = np.argsort(-vals, axis=1)

#top 2 columns names
df[['top1','top2']] = cols[arr[:, :2]]

#top 2 values
df[['max2','max1']] = vals[np.arange(arr.shape[0])[:, None], arr[:, :2]]
print (df)
    A   B   C   D   E top1 top2  max2  max1
0   1   2   3  40   5    D    E    40     5
1  50   6   7   8   9    A    E    50     9
2  10  11  12  13  14    E    D    14    13
Answered By: jezrael

Another approaches to you can get first max then remove it and get max again to get the second max

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 15, 10], 'B': [2, 89, 11], 'C': [80, 7, 12], 'D': [4, 8, 13], 'E': [5, 9, 14]})
        
max1=df.max(axis=1)

maxcolum1=df.idxmax(axis=1)

max2 = df.replace(np.array(df.max(axis=1)),0).max(axis=1)

maxcolum2=df.replace(np.array(df.max(axis=1)),0).idxmax(axis=1)

df2 =pd.DataFrame({ 'max1': max1, 'max2': max2 ,'maxcol1':maxcolum1,'maxcol2':maxcolum2 })

df.join(df2)
Answered By: Mohamed Fathallah
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.