Group dataframe by two columns and then find average count based on one of the groups

Question

Really struggling to get this solution. Suppose I have the dataframe below:

SEX	ITEM	Some other column
M	Socks	233
M	Socks	1
M	Hat	2
F	Socks	3
F	Hat	3
F	Hat	6
F	Hat	2

I would like to find the average number of occurrences of each ITEM based on the SEX group

SEX	ITEM	Average
M	Socks	0.6666
M	Hat	0.3333
F	Socks	0.25
F	Hat	0.75

Can anyone help me with this?

Asked By: ML learner

||

Source

Answer 1

Let us try crosstab

out = pd.crosstab(df['SEX'], df['ITEM'],normalize='index').stack().reset_index(name='average')
Out[36]: 
  SEX   ITEM   average
0   F    Hat  0.750000
1   F  Socks  0.250000
2   M    Hat  0.333333
3   M  Socks  0.666667

Answered By: BENY

Answer 2

You could try the function groupby() with the count() aggregation, and then divide the relevant columns to get the ratio

df2 = df.groupby(['SEX']).count().reset_index()
df1 = df.groupby(['SEX', 'ITEM']).count().reset_index()
df1['SEX_count'] = df1['SEX'].apply(lambda x: df2['ITEM'][df2['SEX']==x].values[0])
df1['Average'] = df1['Someothercolumn'] / df1['SEX_count']
df1 = df1.drop(columns=['Someothercolumn', 'SEX_count'])
print(df1)

Output:

  SEX   ITEM   Average
0   F    Hat  0.750000
1   F  Socks  0.250000
2   M    Hat  0.333333
3   M  Socks  0.666667

Another great answer suggested by @user3483203 (I like this one!)

print(df.groupby(['SEX'], as_index=False)['ITEM'].value_counts(normalize=1))

  SEX   ITEM  proportion
0   F    Hat    0.750000
1   F  Socks    0.250000
2   M  Socks    0.666667
3   M    Hat    0.333333

Answered By: perpetualstudent

Group dataframe by two columns and then find average count based on one of the groups

Question:

Answers: