# Pandas – Alter week number based on the day of the week derived from the date column

## Question:

Suppose I have the following dataframe.

Date | Week_Num | WeekDay |
---|---|---|

01/01/23 | 1 | Sunday |

02/01/23 | 1 | Monday |

04/01/23 | 1 | Wednesday |

05/01/23 | 1 | Thursday |

07/01/23 | 1 | Saturday |

I understand that the third row is in the first week, however I want to use Wednesday as a cut off point as by this point half the week has passed so the following days should move to the next week, such as below.

Date | Week_Num | WeekDay |
---|---|---|

01/01/23 | 1 | Sunday |

02/01/23 | 1 | Monday |

04/01/23 | 1 | Wednesday |

05/01/23 | 2 | Thursday |

07/01/23 | 2 | Saturday |

My attempts so far have been sporadic in their success, this is also somewhat of a edge case but one set of data seems to be prone to these sort of oddities so I wanted a solution.

The idea I have in my head is to use something like:

```
if Weekday-Number > 3 then Week_Num + 1
else do nothing
```

I understand how to do each part separately, but bringing them together is where I get stuck.

Any help would be greatly appreciated.

## Answers:

You can use pandas Timedelta objects.

Using just your `Date`

column, you can convert it to a pandas datetime object (and in fact use it to create your `WeekDay`

column).

```
>>> import pandas as pd
>>> df = pd.DataFrame(
data = {"Date":["1/1/23", "2/1/23", "4/1/23", "5/1/23", "7/1/23" ]}
)
>>> df.Date = pd.to_datetime( df.Date, dayfirst=True )
>>> df
Date
0 2023-01-02
1 2023-01-03
2 2023-01-04
3 2023-01-05
```

Create the `WeekDay`

column:

```
>>> dayOfWeekMap = { 0: "Monday", 1: "Tuesday", 2: "Wednesday", 3: "Thursday", 4: "Friday",
5: "Saturday", 6: "Sunday" }
>>> df["WeekDay"] = df.Date.dt.dayofweek.map( dayOfWeekMap )
>>> df
Date WeekDay
0 2023-01-01 Sunday
1 2023-01-02 Monday
2 2023-01-04 Wednesday
3 2023-01-05 Thursday
4 2023-01-07 Saturday
```

Finally, get the week number with your custom cutoff. First, define the day you want to start. For your test data, I would start at Wednesday 28/12/2023 so that 3/1/2023 is has `Week_Num = 1`

, but 4/1/2023 has `Week_Num = 2`

.

```
>>> start_date = pd.to_datetime( "28/12/2022", dayfirst=True )
>>> df["Week_Num"] = ( ( df.Date - start_date ).dt.days // 7 ).astype( int ) + 1
>>> df
Date WeekDay Week_Num
0 2023-01-01 Sunday 1
1 2023-01-02 Monday 1
2 2023-01-04 Wednesday 2
3 2023-01-05 Thursday 2
4 2023-01-07 Saturday 2
```

What’s happening here: We take the difference in the number of days from the date in the observation from the start date, floor division by 7 (so 6 days after start date is 0, 7 days from start date is 1, 8 days from start date is 1), and then add 1 so that our counter starts at 1.

There is no simple, non-iterative solution using the current data frame. If the WeekDay column was expressed as a number rather than text, a simple df.loc[] statement would give you the desired result.

```
df.loc[df["WeekDay"] > 4, "Week_Num"] += 1
```

This is saying to locate the rows where the Weekday > 4 and increment the Week_Num value by 1.

Using the data frame you have posted, a slower, iterative solution (not recommended) can be used along with the weekday() function from the datetime library.

```
#loop through each data frame row
for i, row in df.iterrows():
#if the date is past wednesday, increment week_num
if(row["dates"].weekday() > 4):
df.at[i, "week_num"] += 1
```

The code iterates through each row item and increments the Week_Num based on an if statement. Using this method also means the WeekDay column is obsolete since datetime allows you to get the weekday number from the date.

"Shortly" (one-liner), you can use `cumsum()`

:

```
df['Week_Num'] = df['WeekDay'].eq('Wednesday').cumsum().add(1).shift(1).fillna(1).astype(int)
```

Example input:

```
df = pd.DataFrame.from_dict({
'Date': ['01/01/23', '02/01/23', '04/01/23', '05/01/23', '07/01/23', '07/01/23', '07/01/23', '07/01/23', '07/01/23', '07/01/23'],
'WeekDay': ['Sunday', 'Monday', 'Wednesday', 'Thursday', 'Saturday', 'Wednesday', 'Thursday', 'Saturday', 'Wednesday', 'Thursday']
})
```

Outputs:

```
Date WeekDay Week_Num
0 01/01/23 Sunday 1
1 02/01/23 Monday 1
2 04/01/23 Wednesday 1
3 05/01/23 Thursday 2
4 07/01/23 Saturday 2
5 07/01/23 Wednesday 2
6 07/01/23 Thursday 3
7 07/01/23 Saturday 3
8 07/01/23 Wednesday 3
9 07/01/23 Thursday 4
```

### Explanations:

- We introduce a new column
`Week_Num`

to`df`

- We use
`cumsum()`

which accumulates a value when we meet a specific requirement notated by`.eq`

- The
`.eq`

part is if we meet a WeekDay ‘Wednesday’ - We add one to each row because
`cumsum`

starts from 0 and we want to start from 1. - We shift downwards each row in the dataframe by 1 to change only the rows not including ‘Wednesday’
- Because we shifted each row downwards then the first row will have
`NaN`

– so we fill it with the week_num of`1`

- We convert the row to have integer values with the
`astype(int)`

I have been working on this and have managed to create a method that allows for multiple years worth of data without it creating errors.

```
#Convert week numbers correctly
df['Date'] = pd.to_datetime(df['Date'])
df['Year'] = df['Date'].dt.strftime('%Y')
df_2018 = df[df['Year'].str.contains('2018')==True]
df_2019 = df[df['Year'].str.contains('2019')==True]
##Date formatting - 2019
df_2019 .Date = pd.to_datetime( df_2019 .Date, dayfirst=True )
dayOfWeekMap = { 0: "Monday", 1: "Tuesday", 2: "Wednesday", 3: "Thursday", 4: "Friday",
5: "Saturday", 6: "Sunday" }
df_2019 ["WeekDay"] = df_2019 .Date.dt.dayofweek.map( dayOfWeekMap )
start_date = pd.to_datetime( "29/12/2018", dayfirst=True )
df_2019 ["Week_Number"] = ( ( df_2019 .Date - start_date ).dt.days // 7 ).astype( int ) + 1
##Date formatting - 2018
df_2018 .Date = pd.to_datetime( df_2018 .Date, dayfirst=True )
dayOfWeekMap = { 0: "Monday", 1: "Tuesday", 2: "Wednesday", 3: "Thursday", 4: "Friday",
5: "Saturday", 6: "Sunday" }
df_2018 ["WeekDay"] = df_2018 .Date.dt.dayofweek.map( dayOfWeekMap )
start_date = pd.to_datetime( "01/01/2018", dayfirst=True )
df_2018 ["Week_Number"] = ( ( df_2018 .Date - start_date ).dt.days // 7 ).astype( int ) + 1
```

This has now created 2 separate df for each year with their start dates being shifted slightly to keep them aligned. After this I will concat them together and then take care of any week 53 numbers that have appeared.

```
#Concat and week 53 fix
frames = [df_2018,df_2019]
df_combo = pd.concat(frames, ignore_index=True)
df_combo ['Year'] = df_combo ['Year'].astype('int')
df_combo .loc[df_combo ["Week_Number"] == 53, "Year"] += 1
df_combo .loc[df_combo ["Week_Number"] == 53, "Week_Number"] -= 52
```

This may still have to efficiency improvements to be made but for the size of the dataset will suffice.