Group dataframe by two columns and then find average count based on one of the groups

Question:

Really struggling to get this solution. Suppose I have the dataframe below:

SEX ITEM Some other column
M Socks 233
M Socks 1
M Hat 2
F Socks 3
F Hat 3
F Hat 6
F Hat 2

I would like to find the average number of occurrences of each ITEM based on the SEX group

SEX ITEM Average
M Socks 0.6666
M Hat 0.3333
F Socks 0.25
F Hat 0.75

Can anyone help me with this?

Asked By: ML learner

||

Answers:

Let us try crosstab

out = pd.crosstab(df['SEX'], df['ITEM'],normalize='index').stack().reset_index(name='average')
Out[36]: 
  SEX   ITEM   average
0   F    Hat  0.750000
1   F  Socks  0.250000
2   M    Hat  0.333333
3   M  Socks  0.666667
Answered By: BENY

You could try the function groupby() with the count() aggregation, and then divide the relevant columns to get the ratio

df2 = df.groupby(['SEX']).count().reset_index()
df1 = df.groupby(['SEX', 'ITEM']).count().reset_index()
df1['SEX_count'] = df1['SEX'].apply(lambda x: df2['ITEM'][df2['SEX']==x].values[0])
df1['Average'] = df1['Someothercolumn'] / df1['SEX_count']
df1 = df1.drop(columns=['Someothercolumn', 'SEX_count'])
print(df1)

Output:

  SEX   ITEM   Average
0   F    Hat  0.750000
1   F  Socks  0.250000
2   M    Hat  0.333333
3   M  Socks  0.666667

Another great answer suggested by @user3483203 (I like this one!)

print(df.groupby(['SEX'], as_index=False)['ITEM'].value_counts(normalize=1))

  SEX   ITEM  proportion
0   F    Hat    0.750000
1   F  Socks    0.250000
2   M  Socks    0.666667
3   M    Hat    0.333333
Answered By: perpetualstudent