Group pandas DataFrame on column and sum it while retaining the number of sumed observations

Question:

I have a pandas Dataframe that looks like this:

import pandas as pd
df = pd.DataFrame({'id':[1, 1, 2, 2], 'comp': [-0.10,0.20,-0.10, 0.4], 'word': ['boy','girl','man', 'woman']})

I would like to group the dataframe on id, and calculate the sum of corresponding comp as well as get a new column called n_obs that tracks how many rows(ids) were summed up.

I tried using df.groupby('id').sum() but this is not quite producing the results that I want.

I’d like an output on the below form:

id   comp   n_obs
1    0.1    2
2    0.3    2

Any suggestions on how I can do this?

Asked By: OLGJ

||

Answers:

You can use .groupby() with .agg():

df.groupby("id").agg(comp=("comp", "sum"), n_obs=("id", "count"))

This outputs:

    comp  n_obs
id
1    0.1      2
2    0.3      2
Answered By: BrokenBenchmark
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.