Row-wise Cumulative Mean Across Grouped Columns using Pandas

Question:

I would like to create multiple columns which show the row-wise cumulative mean for grouped columns. Here is some sample data:

import pandas as pd

data = [[1, 4, 6, 10, 15, 40, 90, 100], [2, 5, 3, 11, 25, 50, 90, 120], [3, 7, 9, 14, 35, 55, 100, 120]]
df = pd.DataFrame(data, columns=['a1', 'a2', 'a3', 'a4', 'b1', 'b2', 'b3', 'b4'])

   a1  a2  a3  a4  b1  b2   b3   b4
0   1   4   6  10  15  40   90  100
1   2   5   3  11  25  50   90  120
2   3   7   9  14  35  55  100  120

What I want is to generate new columns like this:

  • New column a1_2 is calculated by the mean of columns a1 and a2 row-wise.
  • New column a1_3 is calculated by the mean of columns a1, a2 and a3 row-wise.
  • New column a1_4 is calculated by the mean of columns a1, a2, a3 and a4 row-wise.

The same should happen for the grouped columns with b. Of course you can do this manually, but this is not ideal when you have too many variables. Here is the expected output:

df['a1_2'] = df[['a1', 'a2']].mean(axis=1)
df['a1_3'] = df[['a1', 'a2', 'a3']].mean(axis=1)
df['a1_4'] = df[['a1', 'a2', 'a3', 'a4']].mean(axis=1)
df['b1_2'] = df[['b1', 'b2']].mean(axis=1)
df['b1_3'] = df[['b1', 'b2', 'b3']].mean(axis=1)
df['b1_4'] = df[['b1', 'b2', 'b3', 'b4']].mean(axis=1)

   a1  a2  a3  a4  b1  b2   b3   b4  a1_2      a1_3  a1_4  b1_2       b1_3   b1_4
0   1   4   6  10  15  40   90  100   2.5  3.666667  5.25  27.5  48.333333  61.25 
1   2   5   3  11  25  50   90  120   3.5  3.333333  5.25  37.5  55.000000  71.25 
2   3   7   9  14  35  55  100  120   5.0  6.333333  8.25  45.0  63.333333  77.50  

So I was wondering if there is some automatic way of doing this?

Asked By: Quinten

||

Answers:

IIUC, you can reshape with wide_to_long, perform a groupby.rolling, then unstack and flatten the multiindex:

df.join(pd
   .wide_to_long(df.reset_index(), stubnames=['a', 'b'], i='index', j='n')
   .groupby(level='index').expanding().mean()
   .loc[lambda d: d.index.get_level_values('n')>1]
   .droplevel(0).unstack('n')
   .pipe(lambda d: d.set_axis(d.columns.map(lambda x: f'{x[0]}1_{x[1]}'), axis=1))
)

output:

   a1  a2  a3  a4  b1  b2   b3   b4  a1_2      a1_3  a1_4  b1_2       b1_3   b1_4
0   1   4   6  10  15  40   90  100   2.5  3.666667  5.25  27.5  48.333333  61.25
1   2   5   3  11  25  50   90  120   3.5  3.333333  5.25  37.5  55.000000  71.25
2   3   7   9  14  35  55  100  120   5.0  6.333333  8.25  45.0  63.333333  77.50
Answered By: mozway

expanding.mean

for c in ('a', 'b'):
    m = df.filter(like=c).expanding(axis=1).mean().iloc[:, 1:]
    df[m.columns.str.replace(r'(d+)$', r'1_1', regex=True)] = m

Result

   a1  a2  a3  a4  b1  b2   b3   b4  a1_2      a1_3  a1_4  b1_2       b1_3   b1_4
0   1   4   6  10  15  40   90  100   2.5  3.666667  5.25  27.5  48.333333  61.25
1   2   5   3  11  25  50   90  120   3.5  3.333333  5.25  37.5  55.000000  71.25
2   3   7   9  14  35  55  100  120   5.0  6.333333  8.25  45.0  63.333333  77.50

Another option:

out = [value.expanding(axis=1).mean()
            .rename(columns = lambda col: f"{col[0]}1_{col[1]}") 
       for _, value in df.groupby(df.columns.str[0], axis = 1)]

pd.concat([df]+out, axis = 1)
   a1  a2  a3  a4  b1  b2   b3   b4  a1_1  a1_2      a1_3  a1_4  b1_1  b1_2       b1_3   b1_4
0   1   4   6  10  15  40   90  100   1.0   2.5  3.666667  5.25  15.0  27.5  48.333333  61.25
1   2   5   3  11  25  50   90  120   2.0   3.5  3.333333  5.25  25.0  37.5  55.000000  71.25
2   3   7   9  14  35  55  100  120   3.0   5.0  6.333333  8.25  35.0  45.0  63.333333  77.50
Answered By: Shubham Sharma
groups   = df.groupby(lambda col: col[0], axis=1)
cummeans = (groups.cumsum(axis=1).div(groups.cumcount().add(1))
                  .filter(regex="[^1]$")
                  .rename(lambda col: re.sub(r"(d+)$", r"1_1", col), axis=1))
result   = df.join(cummeans)
  • get the groups determined by 1st character of columns (or some other way depending on pattern)

  • get the cumulative means = cumsum / cumcount+1

  • filter out the very first cummeans, e.g., to-be a1_1 etc.

  • insert "1_" into the cummean column names

  • join with the original df

In [19]: groups = df.groupby(lambda col: col[0], axis=1)

In [20]: cummeans = groups.cumsum(axis=1).div(groups.cumcount().add(1))

In [21]: cummeans
Out[21]:
    a1   a2        a3    a4    b1    b2         b3     b4
0  1.0  2.5  3.666667  5.25  15.0  27.5  48.333333  61.25
1  2.0  3.5  3.333333  5.25  25.0  37.5  55.000000  71.25
2  3.0  5.0  6.333333  8.25  35.0  45.0  63.333333  77.50

In [22]: _.filter(regex="[^1]$")
Out[22]:
    a2        a3    a4    b2         b3     b4
0  2.5  3.666667  5.25  27.5  48.333333  61.25
1  3.5  3.333333  5.25  37.5  55.000000  71.25
2  5.0  6.333333  8.25  45.0  63.333333  77.50

In [23]: _.rename(lambda col: re.sub(r"(d+)$", r"1_1", col), axis=1)
Out[23]:
   a1_2      a1_3  a1_4  b1_2       b1_3   b1_4
0   2.5  3.666667  5.25  27.5  48.333333  61.25
1   3.5  3.333333  5.25  37.5  55.000000  71.25
2   5.0  6.333333  8.25  45.0  63.333333  77.50

In [24]: df.join(_)
Out[24]:
   a1  a2  a3  a4  b1  b2   b3   b4  a1_2      a1_3  a1_4  b1_2       b1_3   b1_4
0   1   4   6  10  15  40   90  100   2.5  3.666667  5.25  27.5  48.333333  61.25
1   2   5   3  11  25  50   90  120   3.5  3.333333  5.25  37.5  55.000000  71.25
2   3   7   9  14  35  55  100  120   5.0  6.333333  8.25  45.0  63.333333  77.50

as a "one" line:

df.join(df.groupby(lambda col: col[0], axis=1)
          .pipe(lambda gr: gr.cumsum(axis=1).div(gr.cumcount().add(1))
          .filter(regex="[^1]$")
          .rename(lambda col: re.sub(r"(d+)$", r"1_1", col), axis=1)))
Answered By: Mustafa Aydın
def function1(ss:pd.Series):
    return ss.to_frame('col1').assign(col2=lambda dd:dd.index.str[1:].astype(int)).assign(col3=lambda dd:dd.index.str[0])
    .groupby('col3').apply(lambda dd:dd.col1.cumsum()/dd.col2.iloc[1:]).dropna().droplevel(0)

df1.merge(df1.apply(function1,axis=1),left_index=True,right_index=True,suffixes=['','_1']).pipe(print)


   a1  a2  a3  a4  b1  b2   b3   b4  a2_1      a3_1  a4_1  b2_1       b3_1  
0   1   4   6  10  15  40   90  100   2.5  3.666667  5.25  27.5  48.333333   
1   2   5   3  11  25  50   90  120   3.5  3.333333  5.25  37.5  55.000000   
2   3   7   9  14  35  55  100  120   5.0  6.333333  8.25  45.0  63.333333   

    b4_1  
0  61.25  
1  71.25  
2  77.50
Answered By: G.G
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.