Pandas – Alter week number based on the day of the week derived from the date column

Question:

Suppose I have the following dataframe.

Date Week_Num WeekDay
01/01/23 1 Sunday
02/01/23 1 Monday
04/01/23 1 Wednesday
05/01/23 1 Thursday
07/01/23 1 Saturday

I understand that the third row is in the first week, however I want to use Wednesday as a cut off point as by this point half the week has passed so the following days should move to the next week, such as below.

Date Week_Num WeekDay
01/01/23 1 Sunday
02/01/23 1 Monday
04/01/23 1 Wednesday
05/01/23 2 Thursday
07/01/23 2 Saturday

My attempts so far have been sporadic in their success, this is also somewhat of a edge case but one set of data seems to be prone to these sort of oddities so I wanted a solution.

The idea I have in my head is to use something like:

if Weekday-Number > 3 then Week_Num + 1
else do nothing

I understand how to do each part separately, but bringing them together is where I get stuck.
Any help would be greatly appreciated.

Asked By: Ryan

||

Answers:

You can use pandas Timedelta objects.

Using just your Date column, you can convert it to a pandas datetime object (and in fact use it to create your WeekDay column).

>>> import pandas as pd
>>> df = pd.DataFrame( 
    data = {"Date":["1/1/23", "2/1/23", "4/1/23", "5/1/23", "7/1/23" ]}
)
>>> df.Date = pd.to_datetime( df.Date, dayfirst=True )
>>> df

    Date
0   2023-01-02
1   2023-01-03
2   2023-01-04
3   2023-01-05

Create the WeekDay column:

>>> dayOfWeekMap = { 0: "Monday", 1: "Tuesday", 2: "Wednesday", 3: "Thursday", 4: "Friday",
                5: "Saturday", 6: "Sunday" }
>>> df["WeekDay"] = df.Date.dt.dayofweek.map( dayOfWeekMap )
>>> df

    Date    WeekDay
0   2023-01-01  Sunday
1   2023-01-02  Monday
2   2023-01-04  Wednesday
3   2023-01-05  Thursday
4   2023-01-07  Saturday

Finally, get the week number with your custom cutoff. First, define the day you want to start. For your test data, I would start at Wednesday 28/12/2023 so that 3/1/2023 is has Week_Num = 1, but 4/1/2023 has Week_Num = 2.

>>> start_date = pd.to_datetime( "28/12/2022", dayfirst=True )
>>> df["Week_Num"] = ( ( df.Date - start_date ).dt.days // 7 ).astype( int ) + 1
>>> df

    Date        WeekDay Week_Num
0   2023-01-01  Sunday      1
1   2023-01-02  Monday      1
2   2023-01-04  Wednesday   2
3   2023-01-05  Thursday    2
4   2023-01-07  Saturday    2

What’s happening here: We take the difference in the number of days from the date in the observation from the start date, floor division by 7 (so 6 days after start date is 0, 7 days from start date is 1, 8 days from start date is 1), and then add 1 so that our counter starts at 1.

Answered By: Leo

There is no simple, non-iterative solution using the current data frame. If the WeekDay column was expressed as a number rather than text, a simple df.loc[] statement would give you the desired result.

df.loc[df["WeekDay"] > 4, "Week_Num"] += 1

This is saying to locate the rows where the Weekday > 4 and increment the Week_Num value by 1.

Using the data frame you have posted, a slower, iterative solution (not recommended) can be used along with the weekday() function from the datetime library.

#loop through each data frame row
for i, row in df.iterrows():
    #if the date is past wednesday, increment week_num
    if(row["dates"].weekday() > 4):
        df.at[i, "week_num"] += 1                                                    

The code iterates through each row item and increments the Week_Num based on an if statement. Using this method also means the WeekDay column is obsolete since datetime allows you to get the weekday number from the date.

Answered By: cavalierDev

"Shortly" (one-liner), you can use cumsum():

df['Week_Num'] = df['WeekDay'].eq('Wednesday').cumsum().add(1).shift(1).fillna(1).astype(int)

Example input:

df = pd.DataFrame.from_dict({
    'Date': ['01/01/23', '02/01/23', '04/01/23', '05/01/23', '07/01/23', '07/01/23', '07/01/23', '07/01/23', '07/01/23', '07/01/23'],
    'WeekDay': ['Sunday', 'Monday', 'Wednesday', 'Thursday', 'Saturday', 'Wednesday', 'Thursday', 'Saturday', 'Wednesday', 'Thursday']
})

Outputs:

       Date    WeekDay  Week_Num
0  01/01/23     Sunday         1
1  02/01/23     Monday         1
2  04/01/23  Wednesday         1
3  05/01/23   Thursday         2
4  07/01/23   Saturday         2
5  07/01/23  Wednesday         2
6  07/01/23   Thursday         3
7  07/01/23   Saturday         3
8  07/01/23  Wednesday         3
9  07/01/23   Thursday         4


Explanations:

  1. We introduce a new column Week_Num to df
  2. We use cumsum() which accumulates a value when we meet a specific requirement notated by .eq
  3. The .eq part is if we meet a WeekDay ‘Wednesday’
  4. We add one to each row because cumsum starts from 0 and we want to start from 1.
  5. We shift downwards each row in the dataframe by 1 to change only the rows not including ‘Wednesday’
  6. Because we shifted each row downwards then the first row will have NaN – so we fill it with the week_num of 1
  7. We convert the row to have integer values with the astype(int)
Answered By: no_hex

I have been working on this and have managed to create a method that allows for multiple years worth of data without it creating errors.

#Convert week numbers correctly
        df['Date'] = pd.to_datetime(df['Date'])   
        df['Year'] = df['Date'].dt.strftime('%Y')
        df_2018 = df[df['Year'].str.contains('2018')==True]
        df_2019 = df[df['Year'].str.contains('2019')==True]

##Date formatting - 2019
        df_2019 .Date = pd.to_datetime( df_2019 .Date, dayfirst=True )
        dayOfWeekMap = { 0: "Monday", 1: "Tuesday", 2: "Wednesday", 3: "Thursday", 4: "Friday",
                5: "Saturday", 6: "Sunday" }
        df_2019 ["WeekDay"] = df_2019 .Date.dt.dayofweek.map( dayOfWeekMap )
        start_date = pd.to_datetime( "29/12/2018", dayfirst=True ) 
        df_2019 ["Week_Number"] = ( ( df_2019 .Date - start_date ).dt.days // 7 ).astype( int ) + 1

##Date formatting - 2018
        df_2018 .Date = pd.to_datetime( df_2018 .Date, dayfirst=True )
        dayOfWeekMap = { 0: "Monday", 1: "Tuesday", 2: "Wednesday", 3: "Thursday", 4: "Friday",
                5: "Saturday", 6: "Sunday" }
        df_2018 ["WeekDay"] = df_2018 .Date.dt.dayofweek.map( dayOfWeekMap )
        start_date = pd.to_datetime( "01/01/2018", dayfirst=True ) 
        df_2018 ["Week_Number"] = ( ( df_2018 .Date - start_date ).dt.days // 7 ).astype( int ) + 1

This has now created 2 separate df for each year with their start dates being shifted slightly to keep them aligned. After this I will concat them together and then take care of any week 53 numbers that have appeared.

#Concat and week 53 fix
        frames = [df_2018,df_2019]
        df_combo = pd.concat(frames, ignore_index=True)
        
        df_combo ['Year'] = df_combo ['Year'].astype('int')
        df_combo .loc[df_combo ["Week_Number"] == 53, "Year"] += 1 
        df_combo .loc[df_combo ["Week_Number"] == 53, "Week_Number"] -= 52 

This may still have to efficiency improvements to be made but for the size of the dataset will suffice.

Answered By: Ryan
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.