How to get normalized values from groupby and size

Question:

I know that we can get normalized values from value_counts() of a pandas series but when we do a group by on a dataframe, the only way to get counts is through size(). Is there any way to get normalized values with size()?

Example:

df = pd.DataFrame({'subset_product':['A','A','A','B','B','C','C'],
                   'subset_close':[1,1,0,1,1,1,0]})
df2 = df.groupby(['subset_product', 'subset_close']).size().reset_index(name='prod_count')

df.subset_product.value_counts()
A    3
B    2
C    2

df2

enter image description here

Looking to get:

subset_product subset_close prod_count norm
A              0            1          1/3
A              1            2          2/3
B              1            2          2/2
C              1            1          1/2
C              0            1          1/2

subset_product
Besides manually calculating the normalized values as prod_count/total, is there any way to get normalized values?

Asked By: jxn

||

Answers:

I think it is not possible only one groupby + size because groupby by 2 columns subset_product and subset_close and need size by subset_product only for normalize.

Possible solutions are map or transform for Series with same size as df2 with div:

df2 = df.groupby(['subset_product', 'subset_close']).size().reset_index(name='prod_count')
s = df.subset_product.value_counts()
df2['prod_count'] = df2['prod_count'].div(df2['subset_product'].map(s))

Or:

df2 = df.groupby(['subset_product', 'subset_close']).size().reset_index(name='prod_count')
a = df2.groupby('subset_product')['prod_count'].transform('sum')
df2['prod_count'] = df2['prod_count'].div(a)

print (df2)
  subset_product  subset_close  prod_count
0              A             0    0.333333
1              A             1    0.666667
2              B             1    1.000000
3              C             0    0.500000
4              C             1    0.500000
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.