reading Excel file Date/time Incorrectly

Question:

I have an excel file with a column as time and a separate column as date. I’m using the code below to read it :

df = pd.read_excel(r'df.xlsx', parse_dates=[['date', 'time']])

This works perfectly when the date is the same; however, if the dates change then it read it falsely. For example, the excel file is as below:

enter image description here

If I read it using this code the results will be like this:

2021-04-03 00:00:00 23:52:11,A
2021-04-03 00:00:00 23:56:05,A
2021-04-03 00:00:00 23:59:27,A
2021-04-04 00:00:00 1900-01-01 00:03:33,B
2021-04-04 00:00:00 1900-01-01 00:04:33,B
2021-04-04 00:00:00 1900-01-01 00:06:43,B
2021-04-04 00:00:00 1900-01-01 00:10:17,B

How can I fix this so the dataframe has separate columns with the correct date and time like below?

2021-04-03  23:52:11,A
2021-04-03  23:56:05,A
2021-04-03  23:59:27,A
2021-04-04  00:03:33,B
2021-04-04  00:04:33,B
2021-04-04  00:06:43,B
2021-04-04  00:10:17,B

Excel file : https://www.apispreadsheets.com/table/lEooNma9w3X2XfaL/

Asked By: Mohammad.sh

||

Answers:

Since you’re dealing with different date formats within the same data, you can use dateutil.parser.

Example usage:

import dateutil.parser as parser

parser.parse("2021-04-04 00:00:00 1900-01-01 00:03:33,B")

The output will be a datetime object, so you can use it to specify the final output format.

Answered By: Kasper
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.