unable to change date format while loading excel file with pandas

Question:

I’m loading an Excel file with pandas using the parse_dates=True parameter. However, the date format can not change. When I open the file in Excel on my local computer, the date format is accurate, but when the file is loaded in Python, the format is incorrect. The issue is that "dmY acts like mdY for half of the data." I’m expecting the output with YYYY-mm-dd HH:MM:SS. Below is the code for the data

df = pd.DataFrame({'ID':[2914240.0,2914137.0,2929456.0,2920801.0],
                   'RT_Date':['2021-01-02 06:38:00','2021-01-02 02:58:00','02-19-2021 17:59:00','2021-09-02 12:49:00'],
                   'DateCreated':['01-31-2021 14:38:12','01-31-2021 10:57:50','02-19-2021 01:58:38','2021-09-02 11:35:51']})

enter image description here

The code which i tried is below

df = pd.read_excel('demo.xlsx', parse_dates=True)

But the above code is not working and returning the same output as shown in the image. Is there any issue with the data/pandas? Can anyone please help me with this?

Asked By: Nithin Reddy

||

Answers:

Using pd.to_datetime() allows for the use of the infer_datetime_format argument which eases working with datetime values. In this particular case, if you’d like to parse all columns except for ID you can try:

df['RT_Date'],df['DateCreated']  = [pd.to_datetime(df[x],infer_datetime_format=True) for x in df if x != 'ID']

Based on your example, it outputs:

          ID             RT_Date         DateCreated
0  2914240.0 2021-01-02 06:38:00 2021-01-31 14:38:12
1  2914137.0 2021-01-02 02:58:00 2021-01-31 10:57:50
2  2929456.0 2021-02-19 17:59:00 2021-02-19 01:58:38
3  2920801.0 2021-09-02 12:49:00 2021-09-02 11:35:51

Should you want only the date:

df['RT_Date'],df['DateCreated']  = [pd.to_datetime(df[x],infer_datetime_format=True).dt.date for x in df if x != 'ID']

Returning:

          ID     RT_Date DateCreated
0  2914240.0  2021-01-02  2021-01-31
1  2914137.0  2021-01-02  2021-01-31
2  2929456.0  2021-02-19  2021-02-19
3  2920801.0  2021-09-02  2021-09-02

EDIT: If there are multiple columns you wish to evaluate this for and no easy way, then you can create a list of candidate columns:

to_date_cols = ['RT_Date','DateCreated','Additional_Col1']

And use:

for x in to_date_cols:
   df[x] = pd.to_datetime(df[x],infer_datetime_format=True)
Answered By: Celius Stingher