How to add interleaving rows as result of sort / groups?

Question:

I have the following sample input data:

import pandas as pd
df = pd.DataFrame({'col1': ['x', 'y', 'z'], 'col2': [1, 2, 3], 'col3': ['a', 'a', 'b']})

I would like to sort and group by col3 while interleaving the summaries on top of the corresponding group in col1 and get the following output:

    col1  col2
0     a      3
1     x      1
2     y      2
3     b      3
4     z      3  

I can of course do the part:

df.sort_values(by=['col3']).groupby(by=['col3']).sum()

      col2
col3      
  a      3
  b      3

but I am not sure how to interleave the group labels on top of col1.

Asked By: SkyWalker

||

Answers:

Use custom function for top1 row for each group:

def f(x):
    return pd.DataFrame({'col1': x.name, 'col2': x['col2'].sum()}, index=[0]).append(x)
    
df = (df.sort_values(by=['col3'])
        .groupby(by=['col3'], group_keys=False)
        .apply(f)
        .drop('col3', 1)
        .reset_index(drop=True))
print (df)
  col1  col2
0    a     3
1    x     1
2    y     2
3    b     3
4    z     3

More performant solution is use GroupBy.ngroup for indices, aggregate sum amd last join values by concat with only stable sorting by mergesort:

df = df.sort_values(by=['col3'])

df1 = df.groupby(by=['col3'])['col2'].sum().rename_axis('col1').reset_index()
df2 = df.set_index(df.groupby(by=['col3']).ngroup())

df = pd.concat([df1, df2]).sort_index(kind='mergesort', ignore_index=True).drop('col3', 1)
print (df)
  col1  col2
0    a     3
1    x     1
2    y     2
3    b     3
4    z     3
Answered By: jezrael

What about:

(df.melt(id_vars='col2')
   .rename(columns={'value': 'col1'})
   .groupby('col1').sum()
   .reset_index()
)

output:

  col1  col2
0    a     3
1    b     3
2    x     1
3    y     2
4    z     3
Answered By: mozway
def function1(dd:pd.DataFrame):
    df.loc[dd.index.min()-0.5,['col1','col2']]=[dd.name,dd.col2.sum()]

df.groupby('col3').apply(function1).pipe(lambda dd:df.sort_index(ignore_index=True)).drop('col3',axis=1)
 

output

  col1  col2
0    a     3
1    x     1
2    y     2
3    b     3
4    z     3

or use pandasql library

def function1(dd:pd.DataFrame):
    return dd.sql("select '{}' as col1,{} as col2 union select col1,col2 from self".format(dd.name,dd.col2.sum()))

df.groupby('col3').apply(function1).reset_index(drop=False)


      col1  col2
    0    a     3
    1    x     1
    2    y     2
    3    b     3
    4    z     3
Answered By: G.G
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.