Adding rank column for every numerical column in pandas
Question:
Here is an example of my dataframe (my actual dataframe has 20+ columns and 100+ rows)
df = pd.DataFrame([['Jim', 93, 87, 66],
['Bob', 88, 90, 65],
['Joe', 72, 100, 70]],
columns=['Name', 'Score1', 'Score2', 'Score3'])
Name Score1 Score2 Score3
Jim 93 87 66
Bob 88 90 65
Joe 72 100 70
I want to create a new table which shows the rank of each score in a column. For example, the desired output would be:
Name Score1 Score2 Score3
Jim 1 3 2
Bob 2 2 3
Joe 3 1 1
Is it possible to achieve this in pandas by looping through every column?
Answers:
You can use filter
to get the Score columns, then rank
with a dense method in descending order, finally combine_first
the other columns:
out = (df.filter(like='Score')
.rank(method='dense', ascending=False)
.convert_dtypes() # optional, to have integers
.combine_first(df)
)
Output:
Name Score1 Score2 Score3
0 Jim 1 3 2
1 Bob 2 2 3
2 Joe 3 1 1
Yes, it is possible to achieve the desired output in Pandas without looping through every column. One approach is to use the rank method in Pandas to rank the values in each column separately.
# create a new dataframe to store the rankings
rank_df = pd.DataFrame(df['Name'])
# loop through each score column and rank the values
for col in df.columns[1:]:
rank_df[col] = df[col].rank(ascending=False)
Here is an example of my dataframe (my actual dataframe has 20+ columns and 100+ rows)
df = pd.DataFrame([['Jim', 93, 87, 66],
['Bob', 88, 90, 65],
['Joe', 72, 100, 70]],
columns=['Name', 'Score1', 'Score2', 'Score3'])
Name Score1 Score2 Score3
Jim 93 87 66
Bob 88 90 65
Joe 72 100 70
I want to create a new table which shows the rank of each score in a column. For example, the desired output would be:
Name Score1 Score2 Score3
Jim 1 3 2
Bob 2 2 3
Joe 3 1 1
Is it possible to achieve this in pandas by looping through every column?
You can use filter
to get the Score columns, then rank
with a dense method in descending order, finally combine_first
the other columns:
out = (df.filter(like='Score')
.rank(method='dense', ascending=False)
.convert_dtypes() # optional, to have integers
.combine_first(df)
)
Output:
Name Score1 Score2 Score3
0 Jim 1 3 2
1 Bob 2 2 3
2 Joe 3 1 1
Yes, it is possible to achieve the desired output in Pandas without looping through every column. One approach is to use the rank method in Pandas to rank the values in each column separately.
# create a new dataframe to store the rankings
rank_df = pd.DataFrame(df['Name'])
# loop through each score column and rank the values
for col in df.columns[1:]:
rank_df[col] = df[col].rank(ascending=False)