pandas displaying one column's value in the order of top score
Question:
I have a data like
deviceType
category
date
score
A
soccer
07012022
10
A
baseball
07012022
8
B
basketball
07012022
9
A
soccer
07022022
9
and I would like to convert it to
category
A
B
C…(device types so on)
soccer
basketball
baseball
which displays category values under each device type in descending order of average of that category scores (soccer is above baseball for device type A as its average score, 9.5, is higher than average score of baseball, 8)
I was trying this with using groupby with two columns (deviceType and category), but I couldn’t figure out how to put one column’s value under another.
I will appreciate any help.
Answers:
You can gropby('device')
later work with every group separatelly to create Series
with devide as name
, and concatenate with other columns.
import pandas as pd
data = {
'device': ['A', 'A','B','A','C'],
'category': ['soccer','basketball','basketball','soccer','tenis'],
'score': [10,8,9,9,0],
}
df = pd.DataFrame(data)
print(df)
new = pd.DataFrame()
for key, val in df.groupby('device'):
data = val.groupby('category').mean().sort_values(by='score', ascending=False)
print(data)
s = pd.Series(data.index)
s.name = key
new = pd.concat([new, s], axis=1)
print(new)
Result:
device category score
0 A soccer 10
1 A basketball 8
2 B basketball 9
3 A soccer 9
4 C tenis 0
score
category
soccer 9.5
basketball 8.0
score
category
basketball 9.0
score
category
tenis 0.0
A B C
0 soccer basketball tenis
1 basketball NaN NaN
Using numpy:
# Calculate the mean score per category and device type
tmp = df.pivot_table(
index="category",
columns="deviceType",
values="score",
aggfunc="mean",
)
# Extract the score and category matrix into numpy
score = tmp.to_numpy()
category = np.tile(tmp.index, (len(tmp.columns), 1)).T
# Sort the score descending. numpy always sort ascending so to do descending
# sort, we need to negate the `score` array
index = np.argsort(-score, axis=0)
# Sort score and category
ordered_score = np.take_along_axis(score, index, axis=0)
ordered_category = np.take_along_axis(category, index, axis=0)
# Result
pd.DataFrame(
np.where(np.isnan(ordered_score), "", ordered_category), columns=tmp.columns
)
Another solution with pandas:
(
# Calculate the mean score
df.groupby(["deviceType", "category"], as_index=False)["score"]
.mean()
# Sort by the mean score despondingly and assign an order
.sort_values("score", ascending=False)
.assign(order=lambda x: x.groupby("deviceType").cumcount())
# Pivot for final result
.pivot_table(index="order", columns="deviceType", values="category", aggfunc="first")
)
I have a data like
deviceType | category | date | score |
---|---|---|---|
A | soccer | 07012022 | 10 |
A | baseball | 07012022 | 8 |
B | basketball | 07012022 | 9 |
A | soccer | 07022022 | 9 |
and I would like to convert it to
category | A | B | C…(device types so on) |
---|---|---|---|
soccer | basketball | ||
baseball |
which displays category values under each device type in descending order of average of that category scores (soccer is above baseball for device type A as its average score, 9.5, is higher than average score of baseball, 8)
I was trying this with using groupby with two columns (deviceType and category), but I couldn’t figure out how to put one column’s value under another.
I will appreciate any help.
You can gropby('device')
later work with every group separatelly to create Series
with devide as name
, and concatenate with other columns.
import pandas as pd
data = {
'device': ['A', 'A','B','A','C'],
'category': ['soccer','basketball','basketball','soccer','tenis'],
'score': [10,8,9,9,0],
}
df = pd.DataFrame(data)
print(df)
new = pd.DataFrame()
for key, val in df.groupby('device'):
data = val.groupby('category').mean().sort_values(by='score', ascending=False)
print(data)
s = pd.Series(data.index)
s.name = key
new = pd.concat([new, s], axis=1)
print(new)
Result:
device category score
0 A soccer 10
1 A basketball 8
2 B basketball 9
3 A soccer 9
4 C tenis 0
score
category
soccer 9.5
basketball 8.0
score
category
basketball 9.0
score
category
tenis 0.0
A B C
0 soccer basketball tenis
1 basketball NaN NaN
Using numpy:
# Calculate the mean score per category and device type
tmp = df.pivot_table(
index="category",
columns="deviceType",
values="score",
aggfunc="mean",
)
# Extract the score and category matrix into numpy
score = tmp.to_numpy()
category = np.tile(tmp.index, (len(tmp.columns), 1)).T
# Sort the score descending. numpy always sort ascending so to do descending
# sort, we need to negate the `score` array
index = np.argsort(-score, axis=0)
# Sort score and category
ordered_score = np.take_along_axis(score, index, axis=0)
ordered_category = np.take_along_axis(category, index, axis=0)
# Result
pd.DataFrame(
np.where(np.isnan(ordered_score), "", ordered_category), columns=tmp.columns
)
Another solution with pandas:
(
# Calculate the mean score
df.groupby(["deviceType", "category"], as_index=False)["score"]
.mean()
# Sort by the mean score despondingly and assign an order
.sort_values("score", ascending=False)
.assign(order=lambda x: x.groupby("deviceType").cumcount())
# Pivot for final result
.pivot_table(index="order", columns="deviceType", values="category", aggfunc="first")
)