Convert pandas datetime column yyyy-mm-dd to YYYYMMDD

Question:

I have a dataframe with datetime column in the format yyyy-mm-dd. I would like to have it in integer format yyyymmdd. When I try:

x=dates.apply(dt.datetime.strftime('%Y%m%d')).astype(int)

I keep getting an error:

TypeError: descriptor 'strftime' requires a 'datetime.date' object but received a 'str'

This doesn’t work if I try to pass an array. I know that if I pass just on element it will convert, but what is a more pythonic way to do it? I did try using lambda but that didn’t work either.

Asked By: user3757265

||

Answers:

If your column is a string, you will need to first use `pd.to_datetime’,

df['Date'] = pd.to_datetime(df['Date'])

Then, use .dt datetime accessor with strftime:

df = pd.DataFrame({'Date':pd.date_range('2017-01-01', periods = 60, freq='D')})

df.Date.dt.strftime('%Y%m%d').astype(int)

Or use lambda function:

df.Date.apply(lambda x: x.strftime('%Y%m%d')).astype(int)

Output:

0     20170101
1     20170102
2     20170103
3     20170104
4     20170105
5     20170106
6     20170107
7     20170108
8     20170109
9     20170110
10    20170111
11    20170112
12    20170113
13    20170114
14    20170115
15    20170116
16    20170117
17    20170118
18    20170119
19    20170120
20    20170121
21    20170122
22    20170123
23    20170124
24    20170125
25    20170126
26    20170127
27    20170128
28    20170129
29    20170130
30    20170131
31    20170201
32    20170202
33    20170203
34    20170204
35    20170205
36    20170206
37    20170207
38    20170208
39    20170209
40    20170210
41    20170211
42    20170212
43    20170213
44    20170214
45    20170215
46    20170216
47    20170217
48    20170218
49    20170219
50    20170220
51    20170221
52    20170222
53    20170223
54    20170224
55    20170225
56    20170226
57    20170227
58    20170228
59    20170301
Name: Date, dtype: int32
Answered By: Scott Boston

The error in the OP occurred because datetime.datetime.strftime was called without a datetime/date argument in apply(). The format= should be passed as a separate argument to apply(), which will be passed off to strftime() as the format.

from datetime import datetime
x = dates.apply(datetime.strftime, format='%Y%m%d').astype(int)

If the date were strings (instead of datetime/date), then str.replace() should do the job.

x = dates.str.replace('-', '').astype(int)

# using apply
x = dates.apply(lambda x: x.replace('-', '')).astype(int)

A mildly interesting(?) thing to note is that both .dt.strftime and str.replace of pandas are not optimized, so calling Python’s strftime and str.replace via apply() is actually faster than the pandas counterparts (in the case of strftime, it is much faster).

dates = pd.Series(pd.date_range('2020','2200', freq='d'))

%timeit dates.dt.strftime('%Y%m%d')
# 719 ms ± 41.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit dates.apply(datetime.strftime, format='%Y%m%d')
# 472 ms ± 34.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

dates = dates.astype(str)

%timeit dates.str.replace('-', '')
# 30.9 ms ± 2.46 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit dates.apply(lambda x: x.replace('-', ''))
# 26 ms ± 183 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Answered By: cottontail