How can I convert these dates to the correct format in a Pandas Dataframe?
Question:
I have a dataframe with some dates and I want to convert them to datetime format. So I used the pd.to_datetime
function to do so. However, it only works for some of the dates as the others are not written in the correct order. Example:
df = pd.DataFrame({'dates' : ['December 2021 17', '2005 July 01', 'December 2000 01', '2008 May 11',
'October 2000 04', 'September 2016 04', 'May 1998 09']})
Using pd.to_datetime
will only return values for the yy-mm-dd
order. I tried splitting these into list and tried to reorder them, but that didn’t seem to work for me.
Answers:
You can use apply
and give it to_datetime
:
df.dates = df.dates.apply(pd.to_datetime)
This is the output of df
now:
dates
0 2021-12-17
1 2005-07-01
2 2000-12-01
3 2008-05-11
4 2000-10-04
5 2016-09-04
6 1998-05-09
One option is to extract the year, month and date
y = df['dates'].str.extract(r'(?P<year>bd{4}b)',expand=False)
d = df['dates'].str.extract(r'(?P<day>bd{2}b)',expand = False)
m = df['dates'].str.extract(r'(?P<month>b[A-Za-z]+b)',expand = False)
pd.to_datetime(y.str.cat([m,d]),format = '%Y%B%d')
Output:
0 2021-12-17
1 2005-07-01
2 2000-12-01
3 2008-05-11
4 2000-10-04
5 2016-09-04
6 1998-05-09
If you are not comfortable using apply
function (functional programming) as suggested by @Marcelo Paco, you may try this.
Let your dataframe is called date_df
. You can convert the dates
column to your desired format as follows;
import pandas as pd
date_df['dates'] = pd.to_datetime(date_df['dates'])
date_df
Output:
dates
0 2021-12-17
1 2005-07-01
2 2000-12-01
3 2008-05-11
4 2000-10-04
5 2016-09-04
6 1998-05-09
I have a dataframe with some dates and I want to convert them to datetime format. So I used the pd.to_datetime
function to do so. However, it only works for some of the dates as the others are not written in the correct order. Example:
df = pd.DataFrame({'dates' : ['December 2021 17', '2005 July 01', 'December 2000 01', '2008 May 11',
'October 2000 04', 'September 2016 04', 'May 1998 09']})
Using pd.to_datetime
will only return values for the yy-mm-dd
order. I tried splitting these into list and tried to reorder them, but that didn’t seem to work for me.
You can use apply
and give it to_datetime
:
df.dates = df.dates.apply(pd.to_datetime)
This is the output of df
now:
dates
0 2021-12-17
1 2005-07-01
2 2000-12-01
3 2008-05-11
4 2000-10-04
5 2016-09-04
6 1998-05-09
One option is to extract the year, month and date
y = df['dates'].str.extract(r'(?P<year>bd{4}b)',expand=False)
d = df['dates'].str.extract(r'(?P<day>bd{2}b)',expand = False)
m = df['dates'].str.extract(r'(?P<month>b[A-Za-z]+b)',expand = False)
pd.to_datetime(y.str.cat([m,d]),format = '%Y%B%d')
Output:
0 2021-12-17
1 2005-07-01
2 2000-12-01
3 2008-05-11
4 2000-10-04
5 2016-09-04
6 1998-05-09
If you are not comfortable using apply
function (functional programming) as suggested by @Marcelo Paco, you may try this.
Let your dataframe is called date_df
. You can convert the dates
column to your desired format as follows;
import pandas as pd
date_df['dates'] = pd.to_datetime(date_df['dates'])
date_df
Output:
dates
0 2021-12-17
1 2005-07-01
2 2000-12-01
3 2008-05-11
4 2000-10-04
5 2016-09-04
6 1998-05-09