how to add a where filter when using pandas groupby aggregation function

Question:

I have a question about how to using pandas groupby() function but I need to add a filter on it, please follow follow example df:

id c1 c2 c3 
1  1   2  3  
1  2   3  4 
1  3   4  5

if I use code:

grouped_df = df.groupby(['id']).agg(
    Min_1=('c1', np.min),
    Min_2=('c2', np.min),
    Min_3=('c2', np.min))
grouped_df.columns = ['gc1','gc2','gc3']

I will got following result:

id gc1 gc2 gc3
1  1   2   3

but I want add a filter when I do this group by, like if I use SQL to deal with this I will code like this:

select id
min(c1) as gc1,
min(c2) as gc2,
min(c3) as gc3
from df_view
where (c1 between 2 and 3)and (c2 between 2 and 3) and (c3 between 2 and 3)
group by id

so the result will be as follow:

id gc1 gc2 gc3
1  2   2   3

so how this can be done using pandas?

I search the documentation but no where mentioned I can add a filter when do aggregation on pandas.

Asked By: kiritowow

||

Answers:

This does not use any additional pandas function, but one way to do this is to simply index the input to np.min directly. This would be similar to your where clause.

grouped_df = df.groupby(['id']).agg(
    Min_1=('c1', lambda x : np.min(x[(x>=2) & (x<=3)])),
    Min_2=('c2', lambda x : np.min(x[(x>=2) & (x<=3)])),
    Min_3=('c3', lambda x : np.min(x[(x>=2) & (x<=3)])))
grouped_df.columns = ['gc1','gc2','gc3']
grouped_df
Answered By: Matthew Hielsberg
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.