Removing the timestamp from a datetime in pandas dataframe

Question:

Scenario: I have a dataframe with multiple columns retrieved from excel worksheets. Some of these columns are dates where some values are dates (yyyy:mm:dd) and some are datetimes (yyyy:mm:dd 00.00.000000).

Question: How can I remove the time stamp from the dates when they are not the index of my dataframe?

What I already tried: From other posts here in SO (working with dates in pandas – remove unseen characters in datetime and convert to string and How to strip a pandas datetime of date, hours and seconds) I found:

pd.DatetimeIndex(dfST['timestamp']).date

and

strfitme (df['timestamp'].apply(lambda x: x.strftime('%Y-%m-%d'))

But I can’t seem to find a way to use those directly on the wanted column when it is not the index of my dataframe.

Asked By: DGMS89

||

Answers:

You can do the following:

dfST['timestamp'] = pd.to_datetime(dfST['timestamp'])

to_datetime() will infer the formatting of the date column. You can also pass errors='coerce' if the column contains non-date values.

After completing the above, you’ll be able to create a new column containing only date values:

dfST['new_date_column'] = dfST['timestamp'].dt.date
Answered By: Andrew L

You can also use dt.normalize() to convert times to midnight (null times don’t render) or dt.floor to floor the frequency to daily:

df['timestamp'] = pd.to_datetime(df['timestamp'])
df['timestamp'] = df['timestamp'].dt.normalize()

df['timestamp'] = df['timestamp'].dt.floor('D')

Note that this keeps the dtype of the column datetime64[ns] because each element is still of type pd.Timestamp, whereas dt.date suggested in Andrew L’s post converts it to object because each element becomes type datetime.date.

res

Also, it’s worth noting that dt.normalize and dt.floor('D') are both significantly faster (approx. 10 times faster for longer dataframes) than dt.date:

perfplot

Code used to produce the timings plot:

from perfplot import plot
plot(
    setup=lambda n: pd.Series([pd.Timestamp('now')]*n),
    kernels=[lambda s: s.dt.date, lambda s: s.dt.normalize(), lambda s: s.dt.floor('D')],
    labels= ["col.dt.date", "col.dt.normalize()", "col.dt.floor('D')"],
    n_range=[2**k for k in range(21)],
    xlabel='Length of column',
    title='Removing Time From Datetime',
    equality_check=lambda x,y: all(x.eq(y)));
Answered By: cottontail
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.