How to plot average value lines and not every single value in Plotly

Question:

First of all; sorry if what I am writing here is not up to stackoverflow standards, I am trying my best.

I have a dataframe with around 18k rows and 89 columns with information about football players.

For example I need to plot a line graph to visualize the connection between age and overall rating of a player.

But when I plot a line for that with:

fig = px.line(df, x="Age", y="Overall")
fig.show()

This is the result:

Bad Result

This is obviously not a good visualization.

I want to plot the average rating for every age, so its a single line which shows the connection between age and overall rating. Is there an easy function for plotting or do I have to create the right data myself?

Asked By: Funbold

||

Answers:

It sounds like what you might want to do here is groupby() on "age" and then average on "overall" to create a final dataframe before plugging into that plotting function.

Roughly,

import pandas as pd

data = {
    "age": [1, 1, 2, 2, 3, 3],
    "overall": [50, 100, 1, 1, 600, 700],
    # clarifies how to select the correct column to average
    "irrelevant": [1, 1, 1, 1, 1, 1]
}

df = pd.DataFrame(data)
new_df = df.groupby('age')['overall'].mean()
new_df

# age
# 1     75.0
# 2      1.0
# 3    650.0
# Name: overall, dtype: float64

Alternatively, you can use a scatter plot if you’re comfortable having the individual points show the trend instead. Sometimes scatter plots are useful for this situation since a line of averages might have very different samples sizes at each point on the x axis, so you can lose information by making a line.

Answered By: Mike L
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.