Group dataframe by two columns and then find average count based on one of the groups
Question:
Really struggling to get this solution. Suppose I have the dataframe below:
SEX
ITEM
Some other column
M
Socks
233
M
Socks
1
M
Hat
2
F
Socks
3
F
Hat
3
F
Hat
6
F
Hat
2
I would like to find the average number of occurrences of each ITEM based on the SEX group
SEX
ITEM
Average
M
Socks
0.6666
M
Hat
0.3333
F
Socks
0.25
F
Hat
0.75
Can anyone help me with this?
Answers:
Let us try crosstab
out = pd.crosstab(df['SEX'], df['ITEM'],normalize='index').stack().reset_index(name='average')
Out[36]:
SEX ITEM average
0 F Hat 0.750000
1 F Socks 0.250000
2 M Hat 0.333333
3 M Socks 0.666667
You could try the function groupby()
with the count()
aggregation, and then divide the relevant columns to get the ratio
df2 = df.groupby(['SEX']).count().reset_index()
df1 = df.groupby(['SEX', 'ITEM']).count().reset_index()
df1['SEX_count'] = df1['SEX'].apply(lambda x: df2['ITEM'][df2['SEX']==x].values[0])
df1['Average'] = df1['Someothercolumn'] / df1['SEX_count']
df1 = df1.drop(columns=['Someothercolumn', 'SEX_count'])
print(df1)
Output:
SEX ITEM Average
0 F Hat 0.750000
1 F Socks 0.250000
2 M Hat 0.333333
3 M Socks 0.666667
Another great answer suggested by @user3483203 (I like this one!)
print(df.groupby(['SEX'], as_index=False)['ITEM'].value_counts(normalize=1))
SEX ITEM proportion
0 F Hat 0.750000
1 F Socks 0.250000
2 M Socks 0.666667
3 M Hat 0.333333
Really struggling to get this solution. Suppose I have the dataframe below:
SEX | ITEM | Some other column |
---|---|---|
M | Socks | 233 |
M | Socks | 1 |
M | Hat | 2 |
F | Socks | 3 |
F | Hat | 3 |
F | Hat | 6 |
F | Hat | 2 |
I would like to find the average number of occurrences of each ITEM based on the SEX group
SEX | ITEM | Average |
---|---|---|
M | Socks | 0.6666 |
M | Hat | 0.3333 |
F | Socks | 0.25 |
F | Hat | 0.75 |
Can anyone help me with this?
Let us try crosstab
out = pd.crosstab(df['SEX'], df['ITEM'],normalize='index').stack().reset_index(name='average')
Out[36]:
SEX ITEM average
0 F Hat 0.750000
1 F Socks 0.250000
2 M Hat 0.333333
3 M Socks 0.666667
You could try the function groupby()
with the count()
aggregation, and then divide the relevant columns to get the ratio
df2 = df.groupby(['SEX']).count().reset_index()
df1 = df.groupby(['SEX', 'ITEM']).count().reset_index()
df1['SEX_count'] = df1['SEX'].apply(lambda x: df2['ITEM'][df2['SEX']==x].values[0])
df1['Average'] = df1['Someothercolumn'] / df1['SEX_count']
df1 = df1.drop(columns=['Someothercolumn', 'SEX_count'])
print(df1)
Output:
SEX ITEM Average
0 F Hat 0.750000
1 F Socks 0.250000
2 M Hat 0.333333
3 M Socks 0.666667
Another great answer suggested by @user3483203 (I like this one!)
print(df.groupby(['SEX'], as_index=False)['ITEM'].value_counts(normalize=1))
SEX ITEM proportion
0 F Hat 0.750000
1 F Socks 0.250000
2 M Socks 0.666667
3 M Hat 0.333333