Creating a column that takes a Week Number and Year and returns a Date
Question:
I’m currently working with a dataframe where I created a Year and Week # column. I’m trying to create a new column Date that gives me the date for a from the Year and Week # columns.
This is what my dataframe looks like now
Year
Week #
2023
10
2023
11
2023
12
It should look like this
Year
Week #
Date
2023
10
3/6/23
2023
11
3/13/23
2023
12
3/20/23
I tried the following
from datetime import datetime
df['Date'] = datetime.strptime('{}-{}-1'.format(df['Year'], df['Week #']), '%Y-%W-%w').strftime('%Y-%m-%dT%H:%M:%S.%f')[:-3] + 'Z'
However, I got this error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[57], line 3
1 from datetime import datetime
----> 3 df['Date'] = datetime.strptime('{}-{}-1'.format(df['Year'], df['Week #']), '%Y-%W-%w').strftime('%Y-%m-%dT%H:%M:%S.%f')[:-3] + 'Z'
4 df
File /opt/anaconda3/lib/python3.9/_strptime.py:568, in _strptime_datetime(cls, data_string, format)
565 def _strptime_datetime(cls, data_string, format="%a %b %d %H:%M:%S %Y"):
566 """Return a class cls instance based on the input string and the
567 format string."""
--> 568 tt, fraction, gmtoff_fraction = _strptime(data_string, format)
569 tzname, gmtoff = tt[-2:]
570 args = tt[:6] + (fraction,)
File /opt/anaconda3/lib/python3.9/_strptime.py:349, in _strptime(data_string, format)
347 found = format_regex.match(data_string)
348 if not found:
--> 349 raise ValueError("time data %r does not match format %r" %
350 (data_string, format))
351 if len(data_string) != found.end():
352 raise ValueError("unconverted data remains: %s" %
353 data_string[found.end():])
ValueError: time data '0 2020n1 2020n2 2020n3 2020n4
2020n ... n35913 2024n35914 2024n35915 2024n35916
2024n35917 2024nName: Year, Length: 35918, dtype: int64-0 02n1
03n2 04n3 05n4 06n ..n35913 42n35914
43n35915 44n35916 45n35917 46nName: Week #, Length: 35918, dtype:
object-1' does not match format '%Y-%W-%w'
I also tried the following
from datetime import datetime
from isoweek import Week
df['Date'] = Week(df['Year'], df['Week #']).monday()
But I got the following error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[51], line 4
1 from datetime import datetime
2 from isoweek import Week
----> 4 df['Date'] = Week(df['Year'], df['Week #']).monday()
File /opt/anaconda3/lib/python3.9/site-packages/isoweek.py:34, in Week.__new__(cls, year, week)
27 def __new__(cls, year, week):
28 """Initialize a Week tuple with the given year and week number.
29
30 The week number does not have to be within range. The numbers
31 will be normalized if not. The year must be within the range
32 1 to 9999.
33 """
---> 34 if week < 1 or week > 52:
35 return cls(year, 1) + (week - 1)
36 if year < 1 or year > 9999:
File /opt/anaconda3/lib/python3.9/site-packages/pandas/core/generic.py:1527, in NDFrame.__nonzero__(self)
1525 @final
1526 def __nonzero__(self) -> NoReturn:
-> 1527 raise ValueError(
1528 f"The truth value of a {type(self).__name__} is ambiguous. "
1529 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
1530 )
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Answers:
IIUC, you can use the format
parameter to specify your date format. First build your date in string format, then turn it into a real date.
data = {"year": [2022, 2023], "week": [1,2]}
df = pd.DataFrame(data)
print(df)
# year week
# 0 2022 1
# 1 2023 2
# Create 'year-w-1' formatted dates
df['date'] = df['year'].astype(str).str.cat(others=[df['week'].astype(str), np.array(['1']*len(df))], sep='-')
print(df)
# year week date
# 0 2022 1 2022-1-1
# 1 2023 2 2023-2-1
# format is year-week-weekday (1 = Monday)
df['date'] = pd.to_datetime(df['date'], format="%Y-%W-%w")
print(df)
# year week date
# 0 2022 1 2022-01-03
# 1 2023 2 2023-01-09
Note that the -1
in the second dataframe is later interpreted as weekday!
As i do not have your data frame to match i am using sample input
import pandas as pd
import datetime
# create a sample dataframe with week number and year columns
df = pd.DataFrame({"Year": [2016, 2016, 2016, 2017, 2017, 2017],"Week": [43, 44, 51, 2, 5, 12]})
# define a function that takes a week number and year and returns a date
def week_to_date(week, year):
# get the first day of the week (Monday) for the given week and year
date = datetime.date.fromisocalendar(year, week, 1)
return date
# apply the function to the dataframe and create a new column with the date
df["Date"] = df.apply(lambda row: week_to_date(row["Week"], row["Year"]), axis=1)
# print the dataframe
print(df)
Output:
Year Week Date
0 2016 43 2016-10-24
1 2016 44 2016-10-31
2 2016 51 2016-12-19
3 2017 2 2017-01-09
4 2017 5 2017-01-30
5 2017 12 2017-03-20
I’m currently working with a dataframe where I created a Year and Week # column. I’m trying to create a new column Date that gives me the date for a from the Year and Week # columns.
This is what my dataframe looks like now
Year | Week # |
---|---|
2023 | 10 |
2023 | 11 |
2023 | 12 |
It should look like this
Year | Week # | Date |
---|---|---|
2023 | 10 | 3/6/23 |
2023 | 11 | 3/13/23 |
2023 | 12 | 3/20/23 |
I tried the following
from datetime import datetime
df['Date'] = datetime.strptime('{}-{}-1'.format(df['Year'], df['Week #']), '%Y-%W-%w').strftime('%Y-%m-%dT%H:%M:%S.%f')[:-3] + 'Z'
However, I got this error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[57], line 3
1 from datetime import datetime
----> 3 df['Date'] = datetime.strptime('{}-{}-1'.format(df['Year'], df['Week #']), '%Y-%W-%w').strftime('%Y-%m-%dT%H:%M:%S.%f')[:-3] + 'Z'
4 df
File /opt/anaconda3/lib/python3.9/_strptime.py:568, in _strptime_datetime(cls, data_string, format)
565 def _strptime_datetime(cls, data_string, format="%a %b %d %H:%M:%S %Y"):
566 """Return a class cls instance based on the input string and the
567 format string."""
--> 568 tt, fraction, gmtoff_fraction = _strptime(data_string, format)
569 tzname, gmtoff = tt[-2:]
570 args = tt[:6] + (fraction,)
File /opt/anaconda3/lib/python3.9/_strptime.py:349, in _strptime(data_string, format)
347 found = format_regex.match(data_string)
348 if not found:
--> 349 raise ValueError("time data %r does not match format %r" %
350 (data_string, format))
351 if len(data_string) != found.end():
352 raise ValueError("unconverted data remains: %s" %
353 data_string[found.end():])
ValueError: time data '0 2020n1 2020n2 2020n3 2020n4
2020n ... n35913 2024n35914 2024n35915 2024n35916
2024n35917 2024nName: Year, Length: 35918, dtype: int64-0 02n1
03n2 04n3 05n4 06n ..n35913 42n35914
43n35915 44n35916 45n35917 46nName: Week #, Length: 35918, dtype:
object-1' does not match format '%Y-%W-%w'
I also tried the following
from datetime import datetime
from isoweek import Week
df['Date'] = Week(df['Year'], df['Week #']).monday()
But I got the following error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[51], line 4
1 from datetime import datetime
2 from isoweek import Week
----> 4 df['Date'] = Week(df['Year'], df['Week #']).monday()
File /opt/anaconda3/lib/python3.9/site-packages/isoweek.py:34, in Week.__new__(cls, year, week)
27 def __new__(cls, year, week):
28 """Initialize a Week tuple with the given year and week number.
29
30 The week number does not have to be within range. The numbers
31 will be normalized if not. The year must be within the range
32 1 to 9999.
33 """
---> 34 if week < 1 or week > 52:
35 return cls(year, 1) + (week - 1)
36 if year < 1 or year > 9999:
File /opt/anaconda3/lib/python3.9/site-packages/pandas/core/generic.py:1527, in NDFrame.__nonzero__(self)
1525 @final
1526 def __nonzero__(self) -> NoReturn:
-> 1527 raise ValueError(
1528 f"The truth value of a {type(self).__name__} is ambiguous. "
1529 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
1530 )
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
IIUC, you can use the format
parameter to specify your date format. First build your date in string format, then turn it into a real date.
data = {"year": [2022, 2023], "week": [1,2]}
df = pd.DataFrame(data)
print(df)
# year week
# 0 2022 1
# 1 2023 2
# Create 'year-w-1' formatted dates
df['date'] = df['year'].astype(str).str.cat(others=[df['week'].astype(str), np.array(['1']*len(df))], sep='-')
print(df)
# year week date
# 0 2022 1 2022-1-1
# 1 2023 2 2023-2-1
# format is year-week-weekday (1 = Monday)
df['date'] = pd.to_datetime(df['date'], format="%Y-%W-%w")
print(df)
# year week date
# 0 2022 1 2022-01-03
# 1 2023 2 2023-01-09
Note that the -1
in the second dataframe is later interpreted as weekday!
As i do not have your data frame to match i am using sample input
import pandas as pd
import datetime
# create a sample dataframe with week number and year columns
df = pd.DataFrame({"Year": [2016, 2016, 2016, 2017, 2017, 2017],"Week": [43, 44, 51, 2, 5, 12]})
# define a function that takes a week number and year and returns a date
def week_to_date(week, year):
# get the first day of the week (Monday) for the given week and year
date = datetime.date.fromisocalendar(year, week, 1)
return date
# apply the function to the dataframe and create a new column with the date
df["Date"] = df.apply(lambda row: week_to_date(row["Week"], row["Year"]), axis=1)
# print the dataframe
print(df)
Output:
Year Week Date
0 2016 43 2016-10-24
1 2016 44 2016-10-31
2 2016 51 2016-12-19
3 2017 2 2017-01-09
4 2017 5 2017-01-30
5 2017 12 2017-03-20