Get Rankings of Column Names in Pandas Dataframe

Question

I have pivoted the Customer ID against their most frequently purchased genres of performances:

Genre            Jazz     Dance     Music  Theatre
Customer                                        
100000000001           0      3         1        2
100000000002           0      1         6        2
100000000003           0      3        13        4
100000000004           0      5         4        1
100000000005           1     10        16       14

My desired result is to append the column names according to the rankings:

Genre            Jazz     Dance     Music  Theatre          Rank1          Rank2          Rank3          Rank4
Customer                                         
100000000001           0      3         1        2          Dance        Theatre          Music           Jazz
100000000002           0      1         6        2          Music        Theatre          Dance           Jazz
100000000003           0      3        13        4          Music        Theatre          Dance           Jazz
100000000004           0      5         4        1          Dance          Music        Theatre           Jazz
100000000005           1     10        16       14          Music        Theatre          Dance           Jazz

I have looked up some threads but the closest thing I can find is idxmax. However that only gives me Rank1.

Could anyone help me to get the result I need?

Thanks a lot!

Dennis

Asked By: dendoniseden

||

Source

Answer 1

Use:

i = np.argsort(df.to_numpy() * -1, axis=1)
r = pd.DataFrame(df.columns[i], index=df.index, columns=range(1, i.shape[1] + 1)) 
df = df.join(r.add_prefix('Rank'))

Details:

Use np.argsort along axis=1 to get the indices i that would sort the genres in descending order.

print(i)
array([[1, 3, 2, 0],
       [2, 3, 1, 0],
       [2, 3, 1, 0],
       [1, 2, 3, 0],
       [2, 3, 1, 0]])

Create a new dataframe r from the columns of dataframe df taken along the indices i (i.e df.columns[i]), then use DataFrame.join to join the dataframe r with df:

print(df)
              Jazz  Dance  Music  Theatre  Rank1    Rank2    Rank3 Rank4
Customer                                                                
100000000001     0      3      1        2  Dance  Theatre    Music  Jazz
100000000002     0      1      6        2  Music  Theatre    Dance  Jazz
100000000003     0      3     13        4  Music  Theatre    Dance  Jazz
100000000004     0      5      4        1  Dance    Music  Theatre  Jazz
100000000005     1     10     16       14  Music  Theatre    Dance  Jazz

Answered By: Shubham Sharma

Answer 2

Let’s try stack, cumcount and sort_values:

s = df.stack().sort_values(ascending=False).groupby(level=0).cumcount() + 1
s1 = (s.reset_index(1)
    .set_index(0, append=True)
    .unstack(1)
    .add_prefix("Rank")
    
    )
s1.columns = s1.columns.get_level_values(1)

then join back on your customer genre index.

df.join(s1)

                 Jazz  Dance  Music  Theatre  Rank1    Rank2    Rank3 Rank4
Customer_Genre                                                            
100000000001       0      3      1        2  Dance  Theatre    Music  Jazz
100000000002       0      1      6        2  Music  Theatre    Dance  Jazz
100000000003       0      3     13        4  Music  Theatre    Dance  Jazz
100000000004       0      5      4        1  Dance    Music  Theatre  Jazz
100000000005       1     10     16       14  Music  Theatre    Dance  Jazz

Answered By: Umar.H

Answer 3

Try this:

dfp = (df.rank(ascending=False, axis=1).stack()
         .astype(int).rename('rank').reset_index(level=1))
df.assign(**dfp.set_index('rank', append=True)['Genre'].unstack().add_prefix('Rank'))

Output:

Genre         Jazz  Dance  Music  Theatre  Rank1    Rank2    Rank3 Rank4
Customer                                                                
100000000001     0      3      1        2  Dance  Theatre    Music  Jazz
100000000002     0      1      6        2  Music  Theatre    Dance  Jazz
100000000003     0      3     13        4  Music  Theatre    Dance  Jazz
100000000004     0      5      4        1  Dance    Music  Theatre  Jazz
100000000005     1     10     16       14  Music  Theatre    Dance  Jazz

Use rank and reshape dataframe, then join back to original dataframe using assign.

Answered By: Scott Boston

Answer 4

The above solution works, but we now get the below deprecation warning.

r = pd.DataFrame(df.columns[i], index=df.index, columns=range(1, i.shape[1] + 1))

FutureWarning: Support for multi-dimensional indexing (e.g. obj[:, None]) is deprecated and will be removed in a future version. Convert to a numpy
array before indexing instead.

Revised: r = pd.DataFrame(np.array(df.columns)[i], index=df.index, columns=range(1, i.shape[1] + 1))

Answered By: Wally Cheung

Answer 5

Here is a function that improves the previous answers, considering the following:

It solves the deprecation warning mentioned by Wally, by converting the df.columns into a numpy array before indexing them.
It also allows including NaN values and avoids using those columns for the rank columns (leaving their values as NaN too). Check the example.
It also adds the corresponding rank values to map them easily.
Has an additional parameter in case you want to rank them in ascending or descending order.
Adds an additional column specifying which columns had NaN values and were not included in the rank columns. Those values are added in a list.

# Example DataFrame
import numpy as np
import pandas as pd

dic = {'A': [0, np.nan, 2, np.nan],
      'B': [3, 0, 1, 5],
      'C': [1, 2, 0, np.nan]}
df = pd.DataFrame(dic)
print(df)

     A  B    C
0  0.0  3  1.0
1  NaN  0  2.0
2  2.0  1  0.0
3  NaN  5  NaN

# Function
def fun_rank_columns(df, ascending=False):
    factor = 1 if ascending else -1
    # Rank columns showing ranking of column names
    np_sort = np.argsort(df.to_numpy() * factor, axis=1)
    df_rank = pd.DataFrame(np.array(df.columns)[np_sort], index=df.index, columns=range(1, np_sort.shape[1] + 1))
    
    # Corresponding values for each rank column
    np_sort_value = np.sort(df.to_numpy() * factor, axis=1)
    df_rank_value = pd.DataFrame(np_sort_value, index=df.index, columns=range(1, np_sort_value.shape[1] + 1)) * factor
    
    # Columns with nan values to be replaced
    num_col_rank = df_rank.shape[1]
    df_rank['nan_value'] = df.apply(lambda row: [i for i in df.columns if np.isnan(row[i])], axis=1)
    for col in range(1, num_col_rank + 1):
        condition = df_rank.apply(lambda x: x[col] in x['nan_value'], axis=1)
        df_rank.loc[condition, col] = np.nan
        df_rank_value.loc[condition, col] = np.nan

    # Join Results
    df_rank = df_rank.add_prefix('rank_')
    df_rank_value = df_rank_value.add_prefix('rank_value_')
    df_res = df_rank.join(df_rank_value)
    return df_res

# Apply the function
df_res = fun_rank_columns(df, ascending=True)
print(df_res)

  rank_1 rank_2 rank_3 rank_nan_value  rank_value_1  rank_value_2  rank_value_3
0      A      C      B             []           0.0           1.0           3.0
1      B      C    NaN            [A]           0.0           2.0           NaN
2      C      B      A             []           0.0           1.0           2.0
3      B    NaN    NaN         [A, C]           5.0           NaN           NaN

Answered By: Juan Esteban Fonseca

Get Rankings of Column Names in Pandas Dataframe

Question:

Answers: