Pandas – Alter week number based on the day of the week derived from the date column
Question:
Suppose I have the following dataframe.
Date
Week_Num
WeekDay
01/01/23
1
Sunday
02/01/23
1
Monday
04/01/23
1
Wednesday
05/01/23
1
Thursday
07/01/23
1
Saturday
I understand that the third row is in the first week, however I want to use Wednesday as a cut off point as by this point half the week has passed so the following days should move to the next week, such as below.
Date
Week_Num
WeekDay
01/01/23
1
Sunday
02/01/23
1
Monday
04/01/23
1
Wednesday
05/01/23
2
Thursday
07/01/23
2
Saturday
My attempts so far have been sporadic in their success, this is also somewhat of a edge case but one set of data seems to be prone to these sort of oddities so I wanted a solution.
The idea I have in my head is to use something like:
if Weekday-Number > 3 then Week_Num + 1
else do nothing
I understand how to do each part separately, but bringing them together is where I get stuck.
Any help would be greatly appreciated.
Answers:
You can use pandas Timedelta objects.
Using just your Date
column, you can convert it to a pandas datetime object (and in fact use it to create your WeekDay
column).
>>> import pandas as pd
>>> df = pd.DataFrame(
data = {"Date":["1/1/23", "2/1/23", "4/1/23", "5/1/23", "7/1/23" ]}
)
>>> df.Date = pd.to_datetime( df.Date, dayfirst=True )
>>> df
Date
0 2023-01-02
1 2023-01-03
2 2023-01-04
3 2023-01-05
Create the WeekDay
column:
>>> dayOfWeekMap = { 0: "Monday", 1: "Tuesday", 2: "Wednesday", 3: "Thursday", 4: "Friday",
5: "Saturday", 6: "Sunday" }
>>> df["WeekDay"] = df.Date.dt.dayofweek.map( dayOfWeekMap )
>>> df
Date WeekDay
0 2023-01-01 Sunday
1 2023-01-02 Monday
2 2023-01-04 Wednesday
3 2023-01-05 Thursday
4 2023-01-07 Saturday
Finally, get the week number with your custom cutoff. First, define the day you want to start. For your test data, I would start at Wednesday 28/12/2023 so that 3/1/2023 is has Week_Num = 1
, but 4/1/2023 has Week_Num = 2
.
>>> start_date = pd.to_datetime( "28/12/2022", dayfirst=True )
>>> df["Week_Num"] = ( ( df.Date - start_date ).dt.days // 7 ).astype( int ) + 1
>>> df
Date WeekDay Week_Num
0 2023-01-01 Sunday 1
1 2023-01-02 Monday 1
2 2023-01-04 Wednesday 2
3 2023-01-05 Thursday 2
4 2023-01-07 Saturday 2
What’s happening here: We take the difference in the number of days from the date in the observation from the start date, floor division by 7 (so 6 days after start date is 0, 7 days from start date is 1, 8 days from start date is 1), and then add 1 so that our counter starts at 1.
There is no simple, non-iterative solution using the current data frame. If the WeekDay column was expressed as a number rather than text, a simple df.loc[] statement would give you the desired result.
df.loc[df["WeekDay"] > 4, "Week_Num"] += 1
This is saying to locate the rows where the Weekday > 4 and increment the Week_Num value by 1.
Using the data frame you have posted, a slower, iterative solution (not recommended) can be used along with the weekday() function from the datetime library.
#loop through each data frame row
for i, row in df.iterrows():
#if the date is past wednesday, increment week_num
if(row["dates"].weekday() > 4):
df.at[i, "week_num"] += 1
The code iterates through each row item and increments the Week_Num based on an if statement. Using this method also means the WeekDay column is obsolete since datetime allows you to get the weekday number from the date.
"Shortly" (one-liner), you can use cumsum()
:
df['Week_Num'] = df['WeekDay'].eq('Wednesday').cumsum().add(1).shift(1).fillna(1).astype(int)
Example input:
df = pd.DataFrame.from_dict({
'Date': ['01/01/23', '02/01/23', '04/01/23', '05/01/23', '07/01/23', '07/01/23', '07/01/23', '07/01/23', '07/01/23', '07/01/23'],
'WeekDay': ['Sunday', 'Monday', 'Wednesday', 'Thursday', 'Saturday', 'Wednesday', 'Thursday', 'Saturday', 'Wednesday', 'Thursday']
})
Outputs:
Date WeekDay Week_Num
0 01/01/23 Sunday 1
1 02/01/23 Monday 1
2 04/01/23 Wednesday 1
3 05/01/23 Thursday 2
4 07/01/23 Saturday 2
5 07/01/23 Wednesday 2
6 07/01/23 Thursday 3
7 07/01/23 Saturday 3
8 07/01/23 Wednesday 3
9 07/01/23 Thursday 4
Explanations:
- We introduce a new column
Week_Num
to df
- We use
cumsum()
which accumulates a value when we meet a specific requirement notated by .eq
- The
.eq
part is if we meet a WeekDay ‘Wednesday’
- We add one to each row because
cumsum
starts from 0 and we want to start from 1.
- We shift downwards each row in the dataframe by 1 to change only the rows not including ‘Wednesday’
- Because we shifted each row downwards then the first row will have
NaN
– so we fill it with the week_num of 1
- We convert the row to have integer values with the
astype(int)
I have been working on this and have managed to create a method that allows for multiple years worth of data without it creating errors.
#Convert week numbers correctly
df['Date'] = pd.to_datetime(df['Date'])
df['Year'] = df['Date'].dt.strftime('%Y')
df_2018 = df[df['Year'].str.contains('2018')==True]
df_2019 = df[df['Year'].str.contains('2019')==True]
##Date formatting - 2019
df_2019 .Date = pd.to_datetime( df_2019 .Date, dayfirst=True )
dayOfWeekMap = { 0: "Monday", 1: "Tuesday", 2: "Wednesday", 3: "Thursday", 4: "Friday",
5: "Saturday", 6: "Sunday" }
df_2019 ["WeekDay"] = df_2019 .Date.dt.dayofweek.map( dayOfWeekMap )
start_date = pd.to_datetime( "29/12/2018", dayfirst=True )
df_2019 ["Week_Number"] = ( ( df_2019 .Date - start_date ).dt.days // 7 ).astype( int ) + 1
##Date formatting - 2018
df_2018 .Date = pd.to_datetime( df_2018 .Date, dayfirst=True )
dayOfWeekMap = { 0: "Monday", 1: "Tuesday", 2: "Wednesday", 3: "Thursday", 4: "Friday",
5: "Saturday", 6: "Sunday" }
df_2018 ["WeekDay"] = df_2018 .Date.dt.dayofweek.map( dayOfWeekMap )
start_date = pd.to_datetime( "01/01/2018", dayfirst=True )
df_2018 ["Week_Number"] = ( ( df_2018 .Date - start_date ).dt.days // 7 ).astype( int ) + 1
This has now created 2 separate df for each year with their start dates being shifted slightly to keep them aligned. After this I will concat them together and then take care of any week 53 numbers that have appeared.
#Concat and week 53 fix
frames = [df_2018,df_2019]
df_combo = pd.concat(frames, ignore_index=True)
df_combo ['Year'] = df_combo ['Year'].astype('int')
df_combo .loc[df_combo ["Week_Number"] == 53, "Year"] += 1
df_combo .loc[df_combo ["Week_Number"] == 53, "Week_Number"] -= 52
This may still have to efficiency improvements to be made but for the size of the dataset will suffice.
Suppose I have the following dataframe.
Date | Week_Num | WeekDay |
---|---|---|
01/01/23 | 1 | Sunday |
02/01/23 | 1 | Monday |
04/01/23 | 1 | Wednesday |
05/01/23 | 1 | Thursday |
07/01/23 | 1 | Saturday |
I understand that the third row is in the first week, however I want to use Wednesday as a cut off point as by this point half the week has passed so the following days should move to the next week, such as below.
Date | Week_Num | WeekDay |
---|---|---|
01/01/23 | 1 | Sunday |
02/01/23 | 1 | Monday |
04/01/23 | 1 | Wednesday |
05/01/23 | 2 | Thursday |
07/01/23 | 2 | Saturday |
My attempts so far have been sporadic in their success, this is also somewhat of a edge case but one set of data seems to be prone to these sort of oddities so I wanted a solution.
The idea I have in my head is to use something like:
if Weekday-Number > 3 then Week_Num + 1
else do nothing
I understand how to do each part separately, but bringing them together is where I get stuck.
Any help would be greatly appreciated.
You can use pandas Timedelta objects.
Using just your Date
column, you can convert it to a pandas datetime object (and in fact use it to create your WeekDay
column).
>>> import pandas as pd
>>> df = pd.DataFrame(
data = {"Date":["1/1/23", "2/1/23", "4/1/23", "5/1/23", "7/1/23" ]}
)
>>> df.Date = pd.to_datetime( df.Date, dayfirst=True )
>>> df
Date
0 2023-01-02
1 2023-01-03
2 2023-01-04
3 2023-01-05
Create the WeekDay
column:
>>> dayOfWeekMap = { 0: "Monday", 1: "Tuesday", 2: "Wednesday", 3: "Thursday", 4: "Friday",
5: "Saturday", 6: "Sunday" }
>>> df["WeekDay"] = df.Date.dt.dayofweek.map( dayOfWeekMap )
>>> df
Date WeekDay
0 2023-01-01 Sunday
1 2023-01-02 Monday
2 2023-01-04 Wednesday
3 2023-01-05 Thursday
4 2023-01-07 Saturday
Finally, get the week number with your custom cutoff. First, define the day you want to start. For your test data, I would start at Wednesday 28/12/2023 so that 3/1/2023 is has Week_Num = 1
, but 4/1/2023 has Week_Num = 2
.
>>> start_date = pd.to_datetime( "28/12/2022", dayfirst=True )
>>> df["Week_Num"] = ( ( df.Date - start_date ).dt.days // 7 ).astype( int ) + 1
>>> df
Date WeekDay Week_Num
0 2023-01-01 Sunday 1
1 2023-01-02 Monday 1
2 2023-01-04 Wednesday 2
3 2023-01-05 Thursday 2
4 2023-01-07 Saturday 2
What’s happening here: We take the difference in the number of days from the date in the observation from the start date, floor division by 7 (so 6 days after start date is 0, 7 days from start date is 1, 8 days from start date is 1), and then add 1 so that our counter starts at 1.
There is no simple, non-iterative solution using the current data frame. If the WeekDay column was expressed as a number rather than text, a simple df.loc[] statement would give you the desired result.
df.loc[df["WeekDay"] > 4, "Week_Num"] += 1
This is saying to locate the rows where the Weekday > 4 and increment the Week_Num value by 1.
Using the data frame you have posted, a slower, iterative solution (not recommended) can be used along with the weekday() function from the datetime library.
#loop through each data frame row
for i, row in df.iterrows():
#if the date is past wednesday, increment week_num
if(row["dates"].weekday() > 4):
df.at[i, "week_num"] += 1
The code iterates through each row item and increments the Week_Num based on an if statement. Using this method also means the WeekDay column is obsolete since datetime allows you to get the weekday number from the date.
"Shortly" (one-liner), you can use cumsum()
:
df['Week_Num'] = df['WeekDay'].eq('Wednesday').cumsum().add(1).shift(1).fillna(1).astype(int)
Example input:
df = pd.DataFrame.from_dict({
'Date': ['01/01/23', '02/01/23', '04/01/23', '05/01/23', '07/01/23', '07/01/23', '07/01/23', '07/01/23', '07/01/23', '07/01/23'],
'WeekDay': ['Sunday', 'Monday', 'Wednesday', 'Thursday', 'Saturday', 'Wednesday', 'Thursday', 'Saturday', 'Wednesday', 'Thursday']
})
Outputs:
Date WeekDay Week_Num
0 01/01/23 Sunday 1
1 02/01/23 Monday 1
2 04/01/23 Wednesday 1
3 05/01/23 Thursday 2
4 07/01/23 Saturday 2
5 07/01/23 Wednesday 2
6 07/01/23 Thursday 3
7 07/01/23 Saturday 3
8 07/01/23 Wednesday 3
9 07/01/23 Thursday 4
Explanations:
- We introduce a new column
Week_Num
todf
- We use
cumsum()
which accumulates a value when we meet a specific requirement notated by.eq
- The
.eq
part is if we meet a WeekDay ‘Wednesday’ - We add one to each row because
cumsum
starts from 0 and we want to start from 1. - We shift downwards each row in the dataframe by 1 to change only the rows not including ‘Wednesday’
- Because we shifted each row downwards then the first row will have
NaN
– so we fill it with the week_num of1
- We convert the row to have integer values with the
astype(int)
I have been working on this and have managed to create a method that allows for multiple years worth of data without it creating errors.
#Convert week numbers correctly
df['Date'] = pd.to_datetime(df['Date'])
df['Year'] = df['Date'].dt.strftime('%Y')
df_2018 = df[df['Year'].str.contains('2018')==True]
df_2019 = df[df['Year'].str.contains('2019')==True]
##Date formatting - 2019
df_2019 .Date = pd.to_datetime( df_2019 .Date, dayfirst=True )
dayOfWeekMap = { 0: "Monday", 1: "Tuesday", 2: "Wednesday", 3: "Thursday", 4: "Friday",
5: "Saturday", 6: "Sunday" }
df_2019 ["WeekDay"] = df_2019 .Date.dt.dayofweek.map( dayOfWeekMap )
start_date = pd.to_datetime( "29/12/2018", dayfirst=True )
df_2019 ["Week_Number"] = ( ( df_2019 .Date - start_date ).dt.days // 7 ).astype( int ) + 1
##Date formatting - 2018
df_2018 .Date = pd.to_datetime( df_2018 .Date, dayfirst=True )
dayOfWeekMap = { 0: "Monday", 1: "Tuesday", 2: "Wednesday", 3: "Thursday", 4: "Friday",
5: "Saturday", 6: "Sunday" }
df_2018 ["WeekDay"] = df_2018 .Date.dt.dayofweek.map( dayOfWeekMap )
start_date = pd.to_datetime( "01/01/2018", dayfirst=True )
df_2018 ["Week_Number"] = ( ( df_2018 .Date - start_date ).dt.days // 7 ).astype( int ) + 1
This has now created 2 separate df for each year with their start dates being shifted slightly to keep them aligned. After this I will concat them together and then take care of any week 53 numbers that have appeared.
#Concat and week 53 fix
frames = [df_2018,df_2019]
df_combo = pd.concat(frames, ignore_index=True)
df_combo ['Year'] = df_combo ['Year'].astype('int')
df_combo .loc[df_combo ["Week_Number"] == 53, "Year"] += 1
df_combo .loc[df_combo ["Week_Number"] == 53, "Week_Number"] -= 52
This may still have to efficiency improvements to be made but for the size of the dataset will suffice.