Python plotly express line chart with cumulative sum

Question:

In essence, I would like to plot a line chart of my data y ~ x | g, that is I would like to plot the cumulative sums of y separately and colored by groups, without having to add these to the data. Why? Because there are many such columns which I would like to plot and I do not want to add a cumulative column for each one. Consider the following example.

import pandas as pd
df = pd.DataFrame({
    "y" : [1,1,1,2,2,2],
    "x" : [1,2,3,1,2,3],
    "g" : ["a","a","a","b","b","b"]
})

import plotly.express as px
px.line(df, y="y", x="x", color="g")

I am looking for a way to add an argument of sorts to tell plotly to plot the cumulative sum by groups. Is there such a feature or workaround?

Asked By: user2974951

||

Answers:

  • simple pandas join() to cumsum() with required groupby()
  • dataframe for plotly express can now access any cumsum column or column that has not been summed
import pandas as pd
df = pd.DataFrame({
    "y" : [1,1,1,2,2,2],
    "x" : [1,2,3,1,2,3],
    "g" : ["a","a","a","b","b","b"]
})

import plotly.express as px
px.line(df.join(df.groupby("g", as_index=False).cumsum(), rsuffix="_cumsum"), y="y_cumsum", x="x", color="g")

enter image description here

Answered By: Rob Raymond

I thought @rob-raymond ‘s answer would be able to save me …
but it took me another days to crack that up.

Best cases: all "categories" have the same ‘x’ values
… then the proposed solution works. Problem solved

However, in real life, we often have different x values for the different categories… and the generated cumulative area charts quickly turn super odd

Each time a category is missing an ‘x’ value, the category shows a cumulative value of zero for those ‘x’, while the points before and after are themself well cumulating.

The heck of an headache to figure out … and then some more intensive try & error to fix.

Here is the complete solution that handles both data with and without gaps:

df = pd.DataFrame(df_data)
# my 3 columns: ('type', 'nb_objects', 'dt')

# Create cumulative sum:
df.set_index('type', inplace = True) 
cumsum = df.groupby(level=0).apply(lambda x: pd.Series(x['nb_objects'].cumsum().values, index=x['dt'])) 

# filling in gaps to enable plotly to add data for same x :
try:
   no_gap_df_wide_format = cumsum.unstack(level=1).fillna(method='ffill', axis=1) 

   # getting back to column format:
   no_gap_df = no_gap_df_wide_format.stack().rename_axis(index={None: 'type', 'dt':'dt'}).rename('nb_objects_cumsum').reset_index()

except ValueError:
   # there is no gap to fill
   no_gap_df = pd.DataFrame(df_data).join(pd.DataFrame(df_data).groupby("type", as_index=False).cumsum(), rsuffix="_cumsum")

fig = px.area(no_gap_df,x='dt',
              y='nb_objects_cumsum',            
              color="type", 
)
Answered By: Skratt
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.