python dataframe: create a column, with dynamic calculation/percentage based on day and month

Question:

I am struggling with the following.

This is my data frame. The last two columns, are calculations which I am able to create in my query.

    Day Month   Result  Vol     FirstCol    Second Col
0    26    5      Good   123    1%          0%
1    26    5      Bad    716    0%          2%
2    26    5      Other   36    0%          0%
3    26    6      Good  4721    26%         11%
4    26    6      Bad   7148    0%          16%
5    26    6      Other 1387    0%          3%
6    27    5      Good   196    1%          0%
7    27    5      Bad    627    0%          1%
8    27    5      Other   60    0%          0%
9    27    6      Good  6188    34%         14%
10   27    6      Bad   6688    0%          15%
11   27    6      Other 1068    0%          2%
12   28    5      Good   339    2%          1%
13   28    5      Bad   1114    0%          3%
14   28    5      Other   72    0%          0%
15   28    6      Good  6524    36%         15%
16   28    6      Bad   6103    0%          14%
17   28    6      Other  820    0%          2%

First column to calculate the percentage for all Good Result

df['FirstCol'] = np.where(df['Result'].isin(['Good']),df['Vol']/df[df['Result']=='Good']['Vol'].sum(),0) 

Second column to calculate the percentage for all Result

df['SecondCol'] = df['Vol']/df['Vol'].sum()

For the other the code has to be more dynamic, which I am struggling with.
The Third Column should get the % based on each Month. So the percentage for row 0-8 should be 100% and the same for row 9-17.
The Forth Column should get the % based on each Day and Month. So the percentage for row 0-2 should be 100% and the same for row 3-5 and so on. I want to have a dynamic query. Because I don’t want to change each month.

Desired Output

    Day Month   Result  Vol FirstCol    Second Col 0-17 Third Col 1-9 (Month)   Forth Col 1-3 (Day)
0    26 5   Good    123     1%          0%                  1%                  14%
1    26 5   Bad     716     0%          2%                  5%                  82%
2    26 5   Other    36     0%          0%                  0%                  4%
3    26 6   Good    4721    26%         11%                 31%                 36%
4    26 6   Bad     7148    0%          16%                 48%                 54%
5    26 6   Other   1387    0%          3%                  9%                  10%
6    27 5   Good     196    1%          0%                  1%                  22%
7    27 5   Bad      627    0%          1%                  4%                  71%
8    27 5   Other     60    0%          0%                  0%                  7%
9    27 6   Good    6188    34%         14%                 21%                 44%
10   27 6   Bad     6688    0%          15%                 23%                 48%
11   27 6   Other   1068    0%          2%                  4%                  8%
12   28 5   Good     339    2%          1%                  1%                  22%
13   28 5   Bad     1114    0%          3%                  4%                  73%
14   28 5   Other     72    0%          0%                  0%                  5%
15   28 6   Good    6524    36%         15%                 23%                 49%
16   28 6   Bad     6103    0%          14%                 21%                 45%
17   28 6   Other    820    0%          2%                  3%                  6%

Asked By: Rav86

||

Answers:

If I understand you correctly, you want to groupby by multiple columns:

# sort the dataframe to have nicer output:
df = df.sort_values(by=['Month', 'Day'])

df['Third Col'] = df.groupby('Month')['Vol'].transform(lambda x: (x / x.sum()) *100)
df['Fourth Col'] = df.groupby(['Day', 'Month'])['Vol'].transform(lambda x: (x / x.sum())*100)

print(df)

Prints:

    Day  Month Result   Vol FirstCol Second Col  Third Col  Fourth Col
0    26      5   Good   123       1%         0%   3.746573   14.057143
1    26      5    Bad   716       0%         2%  21.809321   81.828571
2    26      5  Other    36       0%         0%   1.096558    4.114286
6    27      5   Good   196       1%         0%   5.970149   22.197055
7    27      5    Bad   627       0%         1%  19.098386   71.007928
8    27      5  Other    60       0%         0%   1.827597    6.795017
12   28      5   Good   339       2%         1%  10.325921   22.229508
13   28      5    Bad  1114       0%         3%  33.932379   73.049180
14   28      5  Other    72       0%         0%   2.193116    4.721311
3    26      6   Good  4721      26%        11%  11.614633   35.614062
4    26      6    Bad  7148       0%        16%  17.585554   53.922752
5    26      6  Other  1387       0%         3%   3.412306   10.463186
9    27      6   Good  6188      34%        14%  15.223756   44.377510
10   27      6    Bad  6688       0%        15%  16.453859   47.963282
11   27      6  Other  1068       0%         2%   2.627500    7.659208
15   28      6   Good  6524      36%        15%  16.050385   48.516398
16   28      6    Bad  6103       0%        14%  15.014638   45.385588
17   28      6  Other   820       0%         2%   2.017369    6.098014
Answered By: Andrej Kesely