Aggregate daily data by month and an additional column
Question:
I’ve got a DataFrame storing daily-based data which is as below:
Date Product Number Description Revenue
2010-01-04 4219-057 Product A 39.299999
2010-01-04 4219-056 Product A 39.520000
2010-01-04 4219-100 Product B 39.520000
2010-01-04 4219-056 Product A 39.520000
2010-01-05 4219-059 Product A 39.520000
2010-01-05 4219-056 Product A 39.520000
2010-01-05 4219-056 Product B 39.520000
2010-02-08 4219-123 Product A 39.520000
2010-02-08 4219-345 Product A 39.520000
2010-02-08 4219-456 Product B 39.520000
2010-02-08 4219-567 Product C 39.520000
2010-02-08 4219-789 Product D 39.520000
(Product number is just to give an idea)
What I intend to do is to merge it into Monthly-based data.
Something like:
Date Description Revenue
2010-01-01 Product A 157.85000 (Sum of all Product A in Month 01)
Product B 79.040000
Product C 00.000000
Product D 00.000000
2010-02-01 Product A 39.299999 (Sum of all Product A in Month 02)
Product B 39.520000
Product C 39.520000
Product D 39.520000
The problem is I have 500+ products for every month
I am new to python and don’t know how to implement it. Currently, I am using
import pandas as pd
import numpy as np
import matplotlib
%matplotlib inline
data.groupby(['DATE','REVENUE']).sum().unstack()
but not grouping it with the Products.
How can I implement this?
Answers:
Convert “Date” to datetime
, then use groupby
and sum
:
# Do this first, if necessary.
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
(df.groupby([pd.Grouper(key='Date', freq='MS'), 'Description'])['Revenue']
.sum()
.reset_index())
Date Description Revenue
0 2010-01-01 A 197.379999
1 2010-01-01 B 79.040000
2 2010-02-01 A 79.040000
3 2010-02-01 B 39.520000
4 2010-02-01 C 39.520000
5 2010-02-01 D 39.520000
The fréquency “MS” specifies to group on dates and set the offset to the start of each month.
Use the following code:
data.groupby(['Date','Description'])['Revenue'].sum()
This is a bit of a workaround but if you simply create a ‘Month_Year’ variable in a new column using –
df['Month_Year'] = df['Date'].dt.to_period('M')
You can then groupby that column and aggregate as needed, like so –
df_agg = df.groupby(["Month_Year", "Description"])['Revenue'].sum().reset_index()
I’ve got a DataFrame storing daily-based data which is as below:
Date Product Number Description Revenue 2010-01-04 4219-057 Product A 39.299999 2010-01-04 4219-056 Product A 39.520000 2010-01-04 4219-100 Product B 39.520000 2010-01-04 4219-056 Product A 39.520000 2010-01-05 4219-059 Product A 39.520000 2010-01-05 4219-056 Product A 39.520000 2010-01-05 4219-056 Product B 39.520000 2010-02-08 4219-123 Product A 39.520000 2010-02-08 4219-345 Product A 39.520000 2010-02-08 4219-456 Product B 39.520000 2010-02-08 4219-567 Product C 39.520000 2010-02-08 4219-789 Product D 39.520000
(Product number is just to give an idea)
What I intend to do is to merge it into Monthly-based data.
Something like:
Date Description Revenue 2010-01-01 Product A 157.85000 (Sum of all Product A in Month 01) Product B 79.040000 Product C 00.000000 Product D 00.000000 2010-02-01 Product A 39.299999 (Sum of all Product A in Month 02) Product B 39.520000 Product C 39.520000 Product D 39.520000
The problem is I have 500+ products for every month
I am new to python and don’t know how to implement it. Currently, I am using
import pandas as pd
import numpy as np
import matplotlib
%matplotlib inline
data.groupby(['DATE','REVENUE']).sum().unstack()
but not grouping it with the Products.
How can I implement this?
Convert “Date” to datetime
, then use groupby
and sum
:
# Do this first, if necessary.
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
(df.groupby([pd.Grouper(key='Date', freq='MS'), 'Description'])['Revenue']
.sum()
.reset_index())
Date Description Revenue
0 2010-01-01 A 197.379999
1 2010-01-01 B 79.040000
2 2010-02-01 A 79.040000
3 2010-02-01 B 39.520000
4 2010-02-01 C 39.520000
5 2010-02-01 D 39.520000
The fréquency “MS” specifies to group on dates and set the offset to the start of each month.
Use the following code:
data.groupby(['Date','Description'])['Revenue'].sum()
This is a bit of a workaround but if you simply create a ‘Month_Year’ variable in a new column using –
df['Month_Year'] = df['Date'].dt.to_period('M')
You can then groupby that column and aggregate as needed, like so –
df_agg = df.groupby(["Month_Year", "Description"])['Revenue'].sum().reset_index()