reading Excel file Date/time Incorrectly
Question:
I have an excel file with a column as time and a separate column as date. I’m using the code below to read it :
df = pd.read_excel(r'df.xlsx', parse_dates=[['date', 'time']])
This works perfectly when the date is the same; however, if the dates change then it read it falsely. For example, the excel file is as below:
If I read it using this code the results will be like this:
2021-04-03 00:00:00 23:52:11,A
2021-04-03 00:00:00 23:56:05,A
2021-04-03 00:00:00 23:59:27,A
2021-04-04 00:00:00 1900-01-01 00:03:33,B
2021-04-04 00:00:00 1900-01-01 00:04:33,B
2021-04-04 00:00:00 1900-01-01 00:06:43,B
2021-04-04 00:00:00 1900-01-01 00:10:17,B
How can I fix this so the dataframe has separate columns with the correct date and time like below?
2021-04-03 23:52:11,A
2021-04-03 23:56:05,A
2021-04-03 23:59:27,A
2021-04-04 00:03:33,B
2021-04-04 00:04:33,B
2021-04-04 00:06:43,B
2021-04-04 00:10:17,B
Excel file : https://www.apispreadsheets.com/table/lEooNma9w3X2XfaL/
Answers:
Since you’re dealing with different date formats within the same data, you can use dateutil.parser.
Example usage:
import dateutil.parser as parser
parser.parse("2021-04-04 00:00:00 1900-01-01 00:03:33,B")
The output will be a datetime object, so you can use it to specify the final output format.
I have an excel file with a column as time and a separate column as date. I’m using the code below to read it :
df = pd.read_excel(r'df.xlsx', parse_dates=[['date', 'time']])
This works perfectly when the date is the same; however, if the dates change then it read it falsely. For example, the excel file is as below:
If I read it using this code the results will be like this:
2021-04-03 00:00:00 23:52:11,A
2021-04-03 00:00:00 23:56:05,A
2021-04-03 00:00:00 23:59:27,A
2021-04-04 00:00:00 1900-01-01 00:03:33,B
2021-04-04 00:00:00 1900-01-01 00:04:33,B
2021-04-04 00:00:00 1900-01-01 00:06:43,B
2021-04-04 00:00:00 1900-01-01 00:10:17,B
How can I fix this so the dataframe has separate columns with the correct date and time like below?
2021-04-03 23:52:11,A
2021-04-03 23:56:05,A
2021-04-03 23:59:27,A
2021-04-04 00:03:33,B
2021-04-04 00:04:33,B
2021-04-04 00:06:43,B
2021-04-04 00:10:17,B
Excel file : https://www.apispreadsheets.com/table/lEooNma9w3X2XfaL/
Since you’re dealing with different date formats within the same data, you can use dateutil.parser.
Example usage:
import dateutil.parser as parser
parser.parse("2021-04-04 00:00:00 1900-01-01 00:03:33,B")
The output will be a datetime object, so you can use it to specify the final output format.