Finding mean/SD of a group of population and mean/SD of remaining population within a data frame

Question:

I have a pandas data frame that looks like this:

id  age  weight  group
1    12    45    [10-20]
1    18    110   [10-20]
1    25    25    [20-30]
1    29    85    [20-30]
1    32    49    [30-40]
1    31    70    [30-40]
1    37    39    [30-40] 

I am looking for a data frame that would look like this: (sd=standard deviation)

  group   group_mean_weight  group_sd_weight  rest_mean_weight  rest_sd_weight
 [10-20]                       
 [20-30] 
 [30-40] 

Here the second/third columns are mean and SD for that group. columns third and fourth are mean and SD for the rest of the groups combined.

Asked By: Kathan Vyas

||

Answers:

Here’s a way to do it:

res = df.group.to_frame().groupby('group').count()
for group in res.index:
    mask = df.group==group
    srGroup, srOther = df.loc[mask, 'weight'], df.loc[~mask, 'weight']
    res.loc[group, ['group_mean_weight','group_sd_weight','rest_mean_weight','rest_sd_weight']] = [
        srGroup.mean(), srGroup.std(), srOther.mean(), srOther.std()]
res = res.reset_index()

Output:

     group  group_mean_weight  group_sd_weight  rest_mean_weight  rest_sd_weight
0  [10-20]          77.500000        45.961941             53.60       24.016661
1  [20-30]          55.000000        42.426407             62.60       28.953411
2  [30-40]          52.666667        15.821926             66.25       38.378596

An alternative way to get the same result is:

res = ( pd.DataFrame(
    df.group.drop_duplicates().to_frame()
        .apply(lambda x: [
            df.loc[df.group==x.group,'weight'].mean(), 
            df.loc[df.group==x.group,'weight'].std(), 
            df.loc[df.group!=x.group,'weight'].mean(), 
            df.loc[df.group!=x.group,'weight'].std()], axis=1, result_type='expand')
        .to_numpy(),
    index=list(df.group.drop_duplicates()),
    columns=['group_mean_weight','group_sd_weight','rest_mean_weight','rest_sd_weight'])
    .reset_index().rename(columns={'index':'group'}) )

Output:

     group  group_mean_weight  group_sd_weight  rest_mean_weight  rest_sd_weight
0  [10-20]          77.500000        45.961941             53.60       24.016661
1  [20-30]          55.000000        42.426407             62.60       28.953411
2  [30-40]          52.666667        15.821926             66.25       38.378596

UPDATE:
OP asked in a comment: "what if I have more than one weight column? what if I have around 10 different weight columns and I want sd for all weight columns?"

To illustrate below, I have created two weight columns (weight and weight2) and have simply provided all 4 aggregates (mean, sd, mean of other, sd of other) for each weight column.

wgtCols = ['weight','weight2']
res = ( pd.concat([ pd.DataFrame(
    df.group.drop_duplicates().to_frame()
        .apply(lambda x: [
            df.loc[df.group==x.group,wgtCol].mean(), 
            df.loc[df.group==x.group,wgtCol].std(), 
            df.loc[df.group!=x.group,wgtCol].mean(), 
            df.loc[df.group!=x.group,wgtCol].std()], axis=1, result_type='expand')
        .to_numpy(),
    index=list(df.group.drop_duplicates()),
    columns=[f'group_mean_{wgtCol}',f'group_sd_{wgtCol}',f'rest_mean_{wgtCol}',f'rest_sd_{wgtCol}'])
    for wgtCol in wgtCols], axis=1)
    .reset_index().rename(columns={'index':'group'}) )

Input:

   id  age  weight  weight2    group
0   1   12      45       55  [10-20]
1   1   18     110      120  [10-20]
2   1   25      25       35  [20-30]
3   1   29      85       95  [20-30]
4   1   32      49       59  [30-40]
5   1   31      70       80  [30-40]
6   1   37      39       49  [30-40]

Output:

     group  group_mean_weight  group_sd_weight  rest_mean_weight  rest_sd_weight  group_mean_weight2  group_sd_weight2  rest_mean_weight2  rest_sd_weight2
0  [10-20]          77.500000        45.961941             53.60       24.016661           87.500000         45.961941              63.60        24.016661
1  [20-30]          55.000000        42.426407             62.60       28.953411           65.000000         42.426407              72.60        28.953411
2  [30-40]          52.666667        15.821926             66.25       38.378596           62.666667         15.821926              76.25        38.378596
Answered By: constantstranger