Grouping monthly data in a dataframe to show row sub-totals

Question:

I have the following table where Date is the index column:

Date ColA ColB
2021-01-01 10 40
2021-01-02 20 15
2022-01-01 10 40
2022-01-02 20 15

I want to add row-subtotals to show yearly summation:

Date ColA ColB
2021-01-01 10 40
2021-01-02 20 15
2021 Subtotal 30 55
2022-01-02 20 15
2021-01-02 20 15
2022 Subtotal 40 30

I tried the following code:

df.groupby(level='Date').transform("sum")

However I am getting the following error:
DateFormatError: invalid date ‘2022 Subtotal’

Any alternatives to achieve similar results?

Asked By: Saanchi

||

Answers:

You can use:

# grouper for years
# convert to datetime and extract year
# convert to string and add ' Subtotal'
group = (pd.to_datetime(df['Date']).dt.year
         .astype(str)
         .add(' Subtotal')
         )

# perform groupby.sum using the above grouper
# concatenation with original data
# sorting by year to move the subtotals below their respective data
out = (pd.concat(
       [df,
        df.groupby(group)
          .sum().reset_index()])
         .sort_values(by='Date',
                      key=lambda x: x.str.extract('(d+)', expand=False),
                      kind='stable'
                      )
)

Output:

            Date  ColA  ColB
0     2021-01-01    10    40
1     2021-01-02    20    15
0  2021 Subtotal    30    55
2     2022-01-01    10    40
3     2022-01-02    20    15
1  2022 Subtotal    30    55
Answered By: mozway
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.