How to set a value of a column to the value of grouping with pandas?

Question

The request

My current solution is totally whacky and inefficient with nested loops. There must be a proper way to get the same results.
I wonder if all this can be achieved in just 1 step, or the first step with grouping is necessary?

My solution:

Original data:

Important: This is only the head of the data. Real data contains all the countries. Note this to avoid confusion about the results.

    year    rank    country continent gdp        gdp_percent
 0  1960    1   USA         America 543300000000    0.468483
 1  1960    2   UK          Europe  73233967692     0.063149
 2  1960    3   France      Europe  62225478000     0.053656
 3  1960    4   China       Asia    59716467625     0.051493
 4  1960    5   Japan       Asia    44307342950     0.038206

The problem

I want to add a column that will have gdp percent of a continent. It’s a simple operation:

1 / SUM(GDP_PERCENT)(grouped by year+continent) * gdp_percent

Code:

## Get the sum 
df_grouped = df.groupby(['year', 'continent'])['gdp_percent'].sum().to_frame()

## Nested loops ...
for grouped_df_index, grouped_df_row in df_grouped.iterrows():
    for df_index, df_row in df.iterrows():
        if df_row[0] == grouped_df_row[0] and df_row[3] == grouped_df_row[1]:
            df.loc[df_index, 'gdp_percent_of_continent'] = grouped_df_row[2]

Final result (the last column):

    year    rank country   continent gdp         gdp_percent    gdp_percent_of_continent
0   1960    1   USA        America  543300000000    0.468483    0.848412
1   1960    2   UK         Europe   73233967692     0.063149    0.257104
2   1960    3   France     Europe   62225478000     0.053656    0.218456
3   1960    4   China      Asia     59716467625     0.051493    0.333881
4   1960    5   Japan      Asia     44307342950     0.038206    0.247727

Asked By: Łukasz M

||

Source

Answer 1

try:

df['gdp_percent_sum'] = df.groupby(['year', 'continent'])['gdp_percent'].transform(sum)
df['gdp_percent_of_continent'] = (1/df['gdp_percent_sum'])*df['gdp_percent']
df = df.drop(columns="gdp_percent_sum")

    year    rank    country continent   gdp          gdp_percent gdp_percent_of_continent
0   1960    1       USA     America     54330000000  0.468483    1.000000
1   1960    2       UK      Europe      73233967692  0.063149    0.540636
2   1960    3       France  Europe      62225478000  0.053656    0.459364
3   1960    4       China   Asia        59716467625  0.051493    0.574064
4   1960    5       Japan   Asia        44307342950  0.038206    0.425936

Answered By: khaled koubaa

Answer 2

You can group the dataframe by year and continent then for gdp_percent column, you can call transform and pass a lambda function with the formula:

df['gdp_percent_of_continent'] = (
    df
    .groupby(['year', 'continent'])['gdp_percent']
    .transform(lambda x: (1 / x.sum()) * x)
)

   year  rank country continent           gdp  gdp_percent  
0  1960     1     USA   America  543300000000     0.468483   
1  1960     2      UK    Europe   73233967692     0.063149   
2  1960     3  France    Europe   62225478000     0.053656   
3  1960     4   China      Asia   59716467625     0.051493   
4  1960     5   Japan      Asia   44307342950     0.038206   
   gdp_percent_of_continent  
0                  1.000000  
1                  0.540636  
2                  0.459364  
3                  0.574064  
4                  0.425936

Answered By: ThePyGuy