Python pandas integer YYYYMMDD to datetime

Question:

I have a DataFrame that looks like the following:

OrdNo  LstInvDt
9      20070620
11     20070830
19     20070719
21     20070719
23     20070719
26     20070911
29     20070918
31      0070816
34     20070925

LstInvDt of dtype int64. As you can see, the integers are of the format YYYYMMDD, e.g. 20070530 – 30th of May 2007. I have tried a range of approaches, the most obvious being;

pd.to_datetime(dt['Date']) and pd.to_datetime(str(dt['Date'])) with multiple variations on the functions different parameters.

The result has been that the date interpreted as being the time. The date is set to 1970-01-01 – outcome as per above example 1970-01-01 00:00:00.020070530

I also tried various .map() functions found in similar posts.

How do I convert it correctly?

Asked By: Rookie

||

Answers:

to_datetime accepts a format string:

In [92]:

t = 20070530
pd.to_datetime(str(t), format='%Y%m%d')
Out[92]:
Timestamp('2007-05-30 00:00:00')

example:

In [94]:

t = 20070530
df = pd.DataFrame({'date':[t]*10})
df
Out[94]:
       date
0  20070530
1  20070530
2  20070530
3  20070530
4  20070530
5  20070530
6  20070530
7  20070530
8  20070530
9  20070530
In [98]:

df['DateTime'] = df['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
df
Out[98]:
       date   DateTime
0  20070530 2007-05-30
1  20070530 2007-05-30
2  20070530 2007-05-30
3  20070530 2007-05-30
4  20070530 2007-05-30
5  20070530 2007-05-30
6  20070530 2007-05-30
7  20070530 2007-05-30
8  20070530 2007-05-30
9  20070530 2007-05-30
In [99]:

df.dtypes
Out[99]:
date                 int64
DateTime    datetime64[ns]
dtype: object

EDIT

Actually it’s quicker to convert the type to string and then convert the entire series to a datetime rather than calling apply on every value:

In [102]:

df['DateTime'] = pd.to_datetime(df['date'].astype(str), format='%Y%m%d')
df
Out[102]:
       date   DateTime
0  20070530 2007-05-30
1  20070530 2007-05-30
2  20070530 2007-05-30
3  20070530 2007-05-30
4  20070530 2007-05-30
5  20070530 2007-05-30
6  20070530 2007-05-30
7  20070530 2007-05-30
8  20070530 2007-05-30
9  20070530 2007-05-30

timings

In [104]:

%timeit df['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))

100 loops, best of 3: 2.55 ms per loop
In [105]:

%timeit pd.to_datetime(df['date'].astype(str), format='%Y%m%d')
1000 loops, best of 3: 396 µs per loop
Answered By: EdChum

You don’t need to cast to strings, pd.to_datetime() can parse

int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like

so directly calling it with the specific format= should work.

df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')

One useful parameter is errors=. By setting it to 'coerce', you can get NaT values for "broken" dates instead of having an error raised.

df['date'] = pd.to_datetime(df['date'], format='%Y%m%d', errors='coerce')
Answered By: cottontail
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.