Convert a common date format in an ISO week date format
Question:
i have this dataframe with this kind of date format
Date Week Number Influenza[it] Febbre[it] Rinorrea[it]
0 2008-01-01 1 220 585 103
1 2008-01-08 2 403 915 147
2 2008-01-15 3 366 895 136
3 2008-01-22 4 305 825 136
4 2008-01-29 5 311 837 121
... ...
I’d like to convert the date format in the ISO week date format like this dataframe (because i need to intersect the two dataframes with the same dates, based on the years and weeks). The format is like “year-weeknumberoftheyear”.
0 2007-42
1 2007-43
2 2007-44
3 2007-45
4 2007-46
... ...
So i was able just to find the ISO weeks of the first dataframe in this way:
wiki = pd.read_csv('file.csv', parse_dates=['Date'])
for i,d in wiki.iterrows():
print d.Date.isocalendar()[1]
Output:
1
2
3
4
...
But i don’t know how to make a date format like the second dataframe (in the way “year-weeknumberoftheyear”)
Answers:
You could use a vectorized approach instead after the read operation:
df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%Y-%V')
df['Date']
0 2008-01
1 2008-02
2 2008-03
3 2008-04
4 2008-05
Name: Date, dtype: object
Here, %V
is the directive corresponding to ISO 8601 week number.
demo:
from io import StringIO
data = StringIO(
'''
Date Week Number Influenza[it] Febbre[it] Rinorrea[it]
2008-01-01 1 220 585 103
2008-01-08 2 403 915 147
2008-01-15 3 366 895 136
2008-01-22 4 305 825 136
2008-01-29 5 311 837 121
''')
df = pd.read_csv(data, sep='s{2,}', parse_dates=['Date'], engine='python')
df
df['Date'].dtypes
dtype('<M8[ns]')
df['Date'].dt.strftime('%Y-%V')
0 2008-01
1 2008-02
2 2008-03
3 2008-04
4 2008-05
Name: Date, dtype: object
edit: (though inefficient, only for reproducibility purposes)
L = ['{}-{}'.format(d.Date.isocalendar()[0], str(d.Date.isocalendar()[1]).zfill(2)) for i,d in wiki.iterrows()]
Construct series
:
>>> pd.Series(L)
0 2008-01
1 2008-02
2 2008-03
3 2008-04
4 2008-05
dtype: object
time.strftime(‘%Y-%W’) may work for you. It is used for format the time.
import pandas as pd
pd.to_datatime(time.time()).strftime(‘%Y-%W’)
‘1970-00’ will be seen in the ouput
The currently accepted answer is incorrect because 2022-01-01 would be converted to 2022-01
and the correct ISO week for this date is 2021-52 because by ISO definition it’s still included in the last week of the previous year. You can see this here:
The best way I found to have a real ISO week so far is this:
df = pd.DataFrame({'date':['2021-12-31', '2022-01-01', '2022-01-04']})
df['date'] = pd.to_datetime(df['date'])
# df['wrong year-week'] = df['date'].dt.strftime('%Y-%V') ## wrong ISO!
df['monday'] = df['date'] - pd.to_timedelta(arg=df['date'].dt.weekday, unit='D')
df['year-week'] = df['monday'].dt.strftime('%Y-%V')
## or in one-line:
df['year-week'] = (df['date'] - pd.to_timedelta(arg=df['date'].dt.weekday, unit='D')).dt.strftime('%Y-%V')
date
year-week
2021-12-31
2021-52
2022-01-01
2021-52
2022-01-04
2022-01
i have this dataframe with this kind of date format
Date Week Number Influenza[it] Febbre[it] Rinorrea[it]
0 2008-01-01 1 220 585 103
1 2008-01-08 2 403 915 147
2 2008-01-15 3 366 895 136
3 2008-01-22 4 305 825 136
4 2008-01-29 5 311 837 121
... ...
I’d like to convert the date format in the ISO week date format like this dataframe (because i need to intersect the two dataframes with the same dates, based on the years and weeks). The format is like “year-weeknumberoftheyear”.
0 2007-42
1 2007-43
2 2007-44
3 2007-45
4 2007-46
... ...
So i was able just to find the ISO weeks of the first dataframe in this way:
wiki = pd.read_csv('file.csv', parse_dates=['Date'])
for i,d in wiki.iterrows():
print d.Date.isocalendar()[1]
Output:
1
2
3
4
...
But i don’t know how to make a date format like the second dataframe (in the way “year-weeknumberoftheyear”)
You could use a vectorized approach instead after the read operation:
df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%Y-%V')
df['Date']
0 2008-01
1 2008-02
2 2008-03
3 2008-04
4 2008-05
Name: Date, dtype: object
Here, %V
is the directive corresponding to ISO 8601 week number.
demo:
from io import StringIO
data = StringIO(
'''
Date Week Number Influenza[it] Febbre[it] Rinorrea[it]
2008-01-01 1 220 585 103
2008-01-08 2 403 915 147
2008-01-15 3 366 895 136
2008-01-22 4 305 825 136
2008-01-29 5 311 837 121
''')
df = pd.read_csv(data, sep='s{2,}', parse_dates=['Date'], engine='python')
df
df['Date'].dtypes
dtype('<M8[ns]')
df['Date'].dt.strftime('%Y-%V')
0 2008-01
1 2008-02
2 2008-03
3 2008-04
4 2008-05
Name: Date, dtype: object
edit: (though inefficient, only for reproducibility purposes)
L = ['{}-{}'.format(d.Date.isocalendar()[0], str(d.Date.isocalendar()[1]).zfill(2)) for i,d in wiki.iterrows()]
Construct series
:
>>> pd.Series(L)
0 2008-01
1 2008-02
2 2008-03
3 2008-04
4 2008-05
dtype: object
time.strftime(‘%Y-%W’) may work for you. It is used for format the time.
import pandas as pd
pd.to_datatime(time.time()).strftime(‘%Y-%W’)
‘1970-00’ will be seen in the ouput
The currently accepted answer is incorrect because 2022-01-01 would be converted to 2022-01
and the correct ISO week for this date is 2021-52 because by ISO definition it’s still included in the last week of the previous year. You can see this here:
The best way I found to have a real ISO week so far is this:
df = pd.DataFrame({'date':['2021-12-31', '2022-01-01', '2022-01-04']})
df['date'] = pd.to_datetime(df['date'])
# df['wrong year-week'] = df['date'].dt.strftime('%Y-%V') ## wrong ISO!
df['monday'] = df['date'] - pd.to_timedelta(arg=df['date'].dt.weekday, unit='D')
df['year-week'] = df['monday'].dt.strftime('%Y-%V')
## or in one-line:
df['year-week'] = (df['date'] - pd.to_timedelta(arg=df['date'].dt.weekday, unit='D')).dt.strftime('%Y-%V')
date | year-week |
---|---|
2021-12-31 | 2021-52 |
2022-01-01 | 2021-52 |
2022-01-04 | 2022-01 |