Pandas groupby agg – how to get counts?
Question:
I am trying to get sum, mean and count of a metric
df.groupby(['id', 'pushid']).agg({"sess_length": [ np.sum, np.mean, np.count]})
But I get “module ‘numpy’ has no attribute ‘count'”, and I have tried different ways of expressing the count function but can’t get it to work. How do I just an aggregate record count together with the other metrics?
Answers:
I think you mean :
df.groupby(['id', 'pushid']).agg({"sess_length": [ 'sum', 'count','mean']})
As mentioned in documentation of pandas, you can use string arguments like ‘sum’,’count’. TBH It’s more preferable way of doing these aggregations.
You can use strings instead of the functions, like so:
df = pd.DataFrame(
{"id": list("ccdef"), "pushid": list("aabbc"),
"sess_length": [10, 20, 30, 40, 50]}
)
df.groupby(["id", "pushid"]).agg({"sess_length": ["sum", "mean", "count"]})
Which outputs:
sess_length
sum mean count
id pushid
c a 30 15 2
d b 30 30 1
e b 40 40 1
f c 50 50 1
This might work:
df.groupby(['id', 'pushid']).agg({"sess_length": [ np.sum, np.mean, np.**size**]})
just use np.size
Not sure why the answer needs to be 30 chars long, when the answer is straightforward
I am trying to get sum, mean and count of a metric
df.groupby(['id', 'pushid']).agg({"sess_length": [ np.sum, np.mean, np.count]})
But I get “module ‘numpy’ has no attribute ‘count'”, and I have tried different ways of expressing the count function but can’t get it to work. How do I just an aggregate record count together with the other metrics?
I think you mean :
df.groupby(['id', 'pushid']).agg({"sess_length": [ 'sum', 'count','mean']})
As mentioned in documentation of pandas, you can use string arguments like ‘sum’,’count’. TBH It’s more preferable way of doing these aggregations.
You can use strings instead of the functions, like so:
df = pd.DataFrame(
{"id": list("ccdef"), "pushid": list("aabbc"),
"sess_length": [10, 20, 30, 40, 50]}
)
df.groupby(["id", "pushid"]).agg({"sess_length": ["sum", "mean", "count"]})
Which outputs:
sess_length
sum mean count
id pushid
c a 30 15 2
d b 30 30 1
e b 40 40 1
f c 50 50 1
This might work:
df.groupby(['id', 'pushid']).agg({"sess_length": [ np.sum, np.mean, np.**size**]})
just use np.size
Not sure why the answer needs to be 30 chars long, when the answer is straightforward