pandas displaying one column's value in the order of top score

Question

I have a data like

deviceType	category	date	score
A	soccer	07012022	10
A	baseball	07012022	8
B	basketball	07012022	9
A	soccer	07022022	9

and I would like to convert it to

category	A	B	C…(device types so on)
	soccer	basketball
	baseball

which displays category values under each device type in descending order of average of that category scores (soccer is above baseball for device type A as its average score, 9.5, is higher than average score of baseball, 8)

I was trying this with using groupby with two columns (deviceType and category), but I couldn’t figure out how to put one column’s value under another.

I will appreciate any help.

Asked By: Daniel Kim

||

Source

Answer 1

You can gropby('device') later work with every group separatelly to create Series with devide as name, and concatenate with other columns.

import pandas as pd

data = {
    'device': ['A', 'A','B','A','C'], 
    'category': ['soccer','basketball','basketball','soccer','tenis'],
    'score': [10,8,9,9,0],
}

df = pd.DataFrame(data)
print(df)

new = pd.DataFrame()

for key, val in df.groupby('device'):
    data = val.groupby('category').mean().sort_values(by='score', ascending=False)
    print(data)
    s = pd.Series(data.index)
    s.name = key
    new = pd.concat([new, s], axis=1)
    
print(new)

Result:

  device    category  score
0      A      soccer     10
1      A  basketball      8
2      B  basketball      9
3      A      soccer      9
4      C       tenis      0

            score
category         
soccer        9.5
basketball    8.0

            score
category         
basketball    9.0

          score
category       
tenis       0.0

            A           B      C
0      soccer  basketball  tenis
1  basketball         NaN    NaN

Answered By: furas

Answer 2

Using numpy:

# Calculate the mean score per category and device type
tmp = df.pivot_table(
    index="category",
    columns="deviceType",
    values="score",
    aggfunc="mean",
)

# Extract the score and category matrix into numpy
score = tmp.to_numpy()
category = np.tile(tmp.index, (len(tmp.columns), 1)).T

# Sort the score descending. numpy always sort ascending so to do descending
# sort, we need to negate the `score` array
index = np.argsort(-score, axis=0)

# Sort score and category
ordered_score = np.take_along_axis(score, index, axis=0)
ordered_category = np.take_along_axis(category, index, axis=0)

# Result
pd.DataFrame(
    np.where(np.isnan(ordered_score), "", ordered_category), columns=tmp.columns
)

Another solution with pandas:

(
    # Calculate the mean score
    df.groupby(["deviceType", "category"], as_index=False)["score"]
    .mean()
    # Sort by the mean score despondingly and assign an order
    .sort_values("score", ascending=False)
    .assign(order=lambda x: x.groupby("deviceType").cumcount())
    # Pivot for final result
    .pivot_table(index="order", columns="deviceType", values="category", aggfunc="first")
)

Answered By: Code Different

pandas displaying one column's value in the order of top score

Question:

Answers: