group by week in pandas
Question:
I’m having this data frame:
Name Date Quantity
Apple 07/11/17 20
orange 07/14/17 20
Apple 07/14/17 70
Orange 07/25/17 40
Apple 07/20/17 30
I want to aggregate this by Name
and Date
to get sum of quantities
Details:
Date: Group, the result should be at the beginning of the week (or just on Monday)
Quantity: Sum, if two or more records have same Name and Date (if falls on same interval)
The desired output is given below:
Name Date Quantity
Apple 07/10/17 90
orange 07/10/17 20
Apple 07/17/17 30
orange 07/24/17 40
Answers:
Let’s use groupby
, resample
with W-Mon
, and sum
:
df.groupby('Name').resample('W-Mon', on='Date').sum().reset_index().sort_values(by='Date')
Output:
Name Date Quantity
0 Apple 2017-07-17 90
3 orange 2017-07-17 20
1 Apple 2017-07-24 30
2 Orange 2017-07-31 40
First, convert column date
to_datetime
and subtract one week as we want the sum for the week ahead of the date and not the week before that date.
Then use groupby
with Grouper
by W-MON and aggregate sum
:
df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
df = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])['Quantity']
.sum()
.reset_index()
.sort_values('Date')
print (df)
Name Date Quantity
0 Apple 2017-07-10 90
3 orange 2017-07-10 20
1 Apple 2017-07-17 30
2 Orange 2017-07-24 40
First convert column date to_datetime. This will group by week starting with Mondays. It will output the week number (but you can change that looking up in
df.groupby(['name', df['date'].dt.strftime('%W')])['quantity'].sum()
Output:
name date
apple 28 90
29 30
orange 28 20
30 40
This groups every row on the previous Monday (if the date is already Monday, nothing is changed). This has the effect of grouping by week:
import pandas as pd, datetime as dt
# df = ...
df['WeekDate'] = df.apply(lambda row: row['Date'] - dt.timedelta(days=row['Date'].weekday()), axis=1)
perweek = df['WeekDate'].groupby(df['WeekDate']).count()
Exemple:
Date WeekDate
2020-06-20 2020-06-15 <- monday
2020-06-21 2020-06-15
2020-06-24 2020-06-22 <- monday
2020-06-25 2020-06-22
2020-06-26 2020-06-22
You already received a lot of good answer and the question is quite old, but, given the fact some of the solutions use deprecated functions and I encounted the same problem and found a different solution I think could be helpful to someone to share it.
Given the dataframe you proposed:
Name Date Quantity
Apple 07/11/17 20
orange 07/14/17 20
Apple 07/14/17 70
Orange 07/25/17 40
Apple 07/20/17 30
We have to convert the values in ‘Date’ as Pandas’ Datetime since they are strings right now.
Then we can use the Series’ dt property that allow us to handle DateTime-like series and extract informations.
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%y')
By having a DateTime format allow us to use the dt parameters to extract the number of the week associated to the date.
In order to do not loose any information I prefer to add a new column with the week number.
Once retrieved the number of the week we can group by that week.
df['WeekNumber'] = df['Date'].dt.isocalendar().week
df.groupby(['Name', 'WeekNumber']).sum()
Name WeekNumber
Apple 28 90
29 30
Orange 28 20
30 40
Small problem: what if we consider different years?
There could be the case in whick our data have a range of years, in that situation we cannot consider only the week (otherwise we would mix up data from one year into another), so it would be useful to extract also the year column from isocalendar().
df['year'] = df['Date'].dt.isocalendar().year
df.groupby(['Name', 'WeekNumber', 'year']).sum()
Name WeekNumber year Quantity
Apple 28 2017 90
29 2017 30
Orange 28 2017 20
30 2017 40
You can use the to_period
method to get the date truncated to the first day of the week (or month if you use the period M
):
df["Week"] = df["Date"].dt.to_period("W").dt.to_timestamp()
df.groupby(["Name", "Week")).sum()
I’m having this data frame:
Name Date Quantity
Apple 07/11/17 20
orange 07/14/17 20
Apple 07/14/17 70
Orange 07/25/17 40
Apple 07/20/17 30
I want to aggregate this by Name
and Date
to get sum of quantities
Details:
Date: Group, the result should be at the beginning of the week (or just on Monday)
Quantity: Sum, if two or more records have same Name and Date (if falls on same interval)
The desired output is given below:
Name Date Quantity
Apple 07/10/17 90
orange 07/10/17 20
Apple 07/17/17 30
orange 07/24/17 40
Let’s use groupby
, resample
with W-Mon
, and sum
:
df.groupby('Name').resample('W-Mon', on='Date').sum().reset_index().sort_values(by='Date')
Output:
Name Date Quantity
0 Apple 2017-07-17 90
3 orange 2017-07-17 20
1 Apple 2017-07-24 30
2 Orange 2017-07-31 40
First, convert column date
to_datetime
and subtract one week as we want the sum for the week ahead of the date and not the week before that date.
Then use groupby
with Grouper
by W-MON and aggregate sum
:
df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
df = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])['Quantity']
.sum()
.reset_index()
.sort_values('Date')
print (df)
Name Date Quantity
0 Apple 2017-07-10 90
3 orange 2017-07-10 20
1 Apple 2017-07-17 30
2 Orange 2017-07-24 40
First convert column date to_datetime. This will group by week starting with Mondays. It will output the week number (but you can change that looking up in
df.groupby(['name', df['date'].dt.strftime('%W')])['quantity'].sum()
Output:
name date
apple 28 90
29 30
orange 28 20
30 40
This groups every row on the previous Monday (if the date is already Monday, nothing is changed). This has the effect of grouping by week:
import pandas as pd, datetime as dt
# df = ...
df['WeekDate'] = df.apply(lambda row: row['Date'] - dt.timedelta(days=row['Date'].weekday()), axis=1)
perweek = df['WeekDate'].groupby(df['WeekDate']).count()
Exemple:
Date WeekDate
2020-06-20 2020-06-15 <- monday
2020-06-21 2020-06-15
2020-06-24 2020-06-22 <- monday
2020-06-25 2020-06-22
2020-06-26 2020-06-22
You already received a lot of good answer and the question is quite old, but, given the fact some of the solutions use deprecated functions and I encounted the same problem and found a different solution I think could be helpful to someone to share it.
Given the dataframe you proposed:
Name Date Quantity
Apple 07/11/17 20
orange 07/14/17 20
Apple 07/14/17 70
Orange 07/25/17 40
Apple 07/20/17 30
We have to convert the values in ‘Date’ as Pandas’ Datetime since they are strings right now.
Then we can use the Series’ dt property that allow us to handle DateTime-like series and extract informations.
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%y')
By having a DateTime format allow us to use the dt parameters to extract the number of the week associated to the date.
In order to do not loose any information I prefer to add a new column with the week number.
Once retrieved the number of the week we can group by that week.
df['WeekNumber'] = df['Date'].dt.isocalendar().week
df.groupby(['Name', 'WeekNumber']).sum()
Name WeekNumber
Apple 28 90
29 30
Orange 28 20
30 40
Small problem: what if we consider different years?
There could be the case in whick our data have a range of years, in that situation we cannot consider only the week (otherwise we would mix up data from one year into another), so it would be useful to extract also the year column from isocalendar().
df['year'] = df['Date'].dt.isocalendar().year
df.groupby(['Name', 'WeekNumber', 'year']).sum()
Name WeekNumber year Quantity
Apple 28 2017 90
29 2017 30
Orange 28 2017 20
30 2017 40
You can use the to_period
method to get the date truncated to the first day of the week (or month if you use the period M
):
df["Week"] = df["Date"].dt.to_period("W").dt.to_timestamp()
df.groupby(["Name", "Week")).sum()