Why can't I separate my date-time column properly using pandas in python?

Question:

I have never asked a question here before but I am in need of help. I am trying to separate my date column from my csv file from the form of '12/10/2022 11:45:12.446 +0200' to a column for date and a column for time.

I have tried what I have found on various sites and by asking ChatGPT but either I get errors, or it works but there are 'NaT' values filling up both the columns.

This is my current code that works but gives me 'NaT' values:

import pandas as pd

data = pd.read_csv('file_directory.csv')
print(data['date'].dtype)

data['date'] = pd.to_datetime(data['date'], errors='coerce', utc=True, format='%m/%d/%Y %H:%M:%S.%f %z')
data['date'] = data['date'].dt.date
data['time'] = data['date'].dt.time
data.drop('date', axis=1, inplace=True)`

Can anyone help me fix it and find the cause of the problem?
Thank you!

Asked By: Megan van Blerk

||

Answers:

You approach should work (although it is not optimal if you want strings in the column). You have to invert two lines of code and avoid dropping the date:

data = pd.DataFrame({'date': ['12/10/2022 11:45:12.446 +0200']})

data['date'] = pd.to_datetime(data['date'], errors='coerce', utc=True, format='%m/%d/%Y %H:%M:%S.%f %z')

# use this line first else you overwrite "date"
data['time'] = data['date'].dt.time
data['date'] = data['date'].dt.date

Output:

         date             time
0  2022-12-10  09:45:12.446000

If you want strings:

data = pd.DataFrame({'date': ['12/10/2022 11:45:12.446 +0200']})

data[['date', 'time']] = data['date'].str.split(r' +', n=1, expand=True)
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.