Incremental group by from a specific year onwards in Pandas

Question:

I have a dataframe that looks like this:

df_dict = {'country': ['Japan','Japan','Japan','Japan','Japan','Japan','Japan', 'Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
           'year': [1970, 1982, 1999, 2014, 2017, 2018, 2021,1981, 1987, 2002, 2015, 2018, 2019, 2021],
           'value': [320, 416, 172, 652, 390, 570, 803, 144, 273, 129, 477, 831, 664,117]}

df = pd.DataFrame(df_dict)

    country year    value
0   Japan   1970    320
1   Japan   1982    416
2   Japan   1999    172
3   Japan   2014    652
4   Japan   2017    390
5   Japan   2018    570
6   Japan   2021    803
7   Greece  1981    144
8   Greece  1987    273
9   Greece  2002    129
10  Greece  2015    477
11  Greece  2018    831
12  Greece  2019    664
13  Greece  2021    117

I am trying to group the data by year from 2014 onwards, but I can’t seem to get it right using groupby(['country','year'])['value']

Practically I want to sum up the values for each country for each year greater than or equal to 2014. So my expected output should look something like this:

    country year    value
0   Japan   2014    1560
1   Japan   2015    1560
2   Japan   2016    1560
3   Japan   2017    1950
4   Japan   2018    2520
5   Japan   2019    2520
6   Japan   2020    2520
7   Japan   2021    3323
8   Greece  2014    546
9   Greece  2015    1023
10  Greece  2016    1023
11  Greece  2017    1023
12  Greece  2018    1854
13  Greece  2019    2518
14  Greece  2020    2518
15  Greece  2021    2635

Where the value for Japan in 2014 is the sum of all previous values where year <= 2014, the value for Japan in 2015is the sum of all previous values where year <= 2014 and so on. The last year I would like to sum is 2021 for all countries in the dataframe.

Asked By: A.N.

||

Answers:

First create MultiIndex by MultiIndex.from_product, then convert years lower like 2014 by Series.clip and aggregate sum, add missing years by Series.reindex and use cumulative sum per countries by GroupBy.cumsum:

mux = pd.MultiIndex.from_product([df['country'].unique(), range(2014, df['year'].max()+1)],
                                 names=['country','year'])

df = (df.groupby(['country',df['year'].clip(lower=2014)])['value']
        .sum()
        .reindex(mux, fill_value=0)
        .groupby(level=0)
        .cumsum()
        .reset_index())
print (df)
   country  year  value
0    Japan  2014   1560
1    Japan  2015   1560
2    Japan  2016   1560
3    Japan  2017   1950
4    Japan  2018   2520
5    Japan  2019   2520
6    Japan  2020   2520
7    Japan  2021   3323
8   Greece  2014    546
9   Greece  2015   1023
10  Greece  2016   1023
11  Greece  2017   1023
12  Greece  2018   1854
13  Greece  2019   2518
14  Greece  2020   2518
15  Greece  2021   2635
Answered By: jezrael

If you don’t mind creating new dataframe, you can consider my code below as an alternative.

Iterate over the list of countries and years, and for each combination, calculate the cumulative sum of the value column up to and including that year. You can do this by filtering the dataframe to include only rows with the current country and year <= the current year, and then applying the cumsum() method. Lastly, append the resulting row to the empty dataframe.

years = list(range(2014, 2022))
countries = df['country'].unique()
result_df = pd.DataFrame(columns=['country', 'year', 'value'])

for country in countries:
    for year in years:
        df_filtered = df[(df['country'] == country) & (df['year'] <= year)]
        cumulative_sum = df_filtered['value'].cumsum().iloc[-1]
        result_df = pd.concat([result_df, pd.DataFrame({'country': country, 'year': year, 'value': cumulative_sum}, index=[0])], ignore_index=True)

output:

> result_df

   country  year value
0    Japan  2014  1560
1    Japan  2015  1560
2    Japan  2016  1560
3    Japan  2017  1950
4    Japan  2018  2520
5    Japan  2019  2520
6    Japan  2020  2520
7    Japan  2021  3323
8   Greece  2014   546
9   Greece  2015  1023
10  Greece  2016  1023
11  Greece  2017  1023
12  Greece  2018  1854
13  Greece  2019  2518
14  Greece  2020  2518
15  Greece  2021  2635
Answered By: JayPeerachai
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.