Aggregate daily data by month and an additional column

Question:

I’ve got a DataFrame storing daily-based data which is as below:

   Date        Product Number  Description        Revenue
2010-01-04       4219-057       Product A        39.299999    
2010-01-04       4219-056       Product A        39.520000
2010-01-04       4219-100       Product B        39.520000
2010-01-04       4219-056       Product A        39.520000
2010-01-05       4219-059       Product A        39.520000
2010-01-05       4219-056       Product A        39.520000
2010-01-05       4219-056       Product B        39.520000
2010-02-08       4219-123       Product A        39.520000
2010-02-08       4219-345       Product A        39.520000
2010-02-08       4219-456       Product B        39.520000
2010-02-08       4219-567       Product C        39.520000
2010-02-08       4219-789       Product D        39.520000

(Product number is just to give an idea)
What I intend to do is to merge it into Monthly-based data.
Something like:

Date        Description        Revenue
2010-01-01    Product A        157.85000 (Sum of all Product A in Month 01)    
              Product B        79.040000
              Product C        00.000000
              Product D        00.000000
2010-02-01    Product A        39.299999 (Sum of all Product A in Month 02)   
              Product B        39.520000
              Product C        39.520000
              Product D        39.520000  

The problem is I have 500+ products for every month

I am new to python and don’t know how to implement it. Currently, I am using

import pandas as pd
import numpy as np
import matplotlib
%matplotlib inline

data.groupby(['DATE','REVENUE']).sum().unstack()

but not grouping it with the Products.

How can I implement this?

Asked By: user11505060

||

Answers:

Convert “Date” to datetime, then use groupby and sum:

# Do this first, if necessary.
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')

(df.groupby([pd.Grouper(key='Date', freq='MS'), 'Description'])['Revenue']
   .sum()
   .reset_index())

        Date Description     Revenue
0 2010-01-01           A  197.379999
1 2010-01-01           B   79.040000
2 2010-02-01           A   79.040000
3 2010-02-01           B   39.520000
4 2010-02-01           C   39.520000
5 2010-02-01           D   39.520000

The fréquency “MS” specifies to group on dates and set the offset to the start of each month.

Answered By: cs95

Use the following code:

data.groupby(['Date','Description'])['Revenue'].sum()
Answered By: oreopot

This is a bit of a workaround but if you simply create a ‘Month_Year’ variable in a new column using –

df['Month_Year'] = df['Date'].dt.to_period('M')

You can then groupby that column and aggregate as needed, like so –

df_agg = df.groupby(["Month_Year", "Description"])['Revenue'].sum().reset_index()
Answered By: John Conor