Convert pandas datetime column yyyy-mm-dd to YYYYMMDD
Question:
I have a dataframe with datetime column in the format yyyy-mm-dd
. I would like to have it in integer format yyyymmdd
. When I try:
x=dates.apply(dt.datetime.strftime('%Y%m%d')).astype(int)
I keep getting an error:
TypeError: descriptor 'strftime' requires a 'datetime.date' object but received a 'str'
This doesn’t work if I try to pass an array. I know that if I pass just on element it will convert, but what is a more pythonic way to do it? I did try using lambda but that didn’t work either.
Answers:
If your column is a string, you will need to first use `pd.to_datetime’,
df['Date'] = pd.to_datetime(df['Date'])
Then, use .dt
datetime accessor with strftime
:
df = pd.DataFrame({'Date':pd.date_range('2017-01-01', periods = 60, freq='D')})
df.Date.dt.strftime('%Y%m%d').astype(int)
Or use lambda function:
df.Date.apply(lambda x: x.strftime('%Y%m%d')).astype(int)
Output:
0 20170101
1 20170102
2 20170103
3 20170104
4 20170105
5 20170106
6 20170107
7 20170108
8 20170109
9 20170110
10 20170111
11 20170112
12 20170113
13 20170114
14 20170115
15 20170116
16 20170117
17 20170118
18 20170119
19 20170120
20 20170121
21 20170122
22 20170123
23 20170124
24 20170125
25 20170126
26 20170127
27 20170128
28 20170129
29 20170130
30 20170131
31 20170201
32 20170202
33 20170203
34 20170204
35 20170205
36 20170206
37 20170207
38 20170208
39 20170209
40 20170210
41 20170211
42 20170212
43 20170213
44 20170214
45 20170215
46 20170216
47 20170217
48 20170218
49 20170219
50 20170220
51 20170221
52 20170222
53 20170223
54 20170224
55 20170225
56 20170226
57 20170227
58 20170228
59 20170301
Name: Date, dtype: int32
The error in the OP occurred because datetime.datetime.strftime
was called without a datetime/date argument in apply()
. The format=
should be passed as a separate argument to apply()
, which will be passed off to strftime()
as the format.
from datetime import datetime
x = dates.apply(datetime.strftime, format='%Y%m%d').astype(int)
If the date were strings (instead of datetime/date), then str.replace()
should do the job.
x = dates.str.replace('-', '').astype(int)
# using apply
x = dates.apply(lambda x: x.replace('-', '')).astype(int)
A mildly interesting(?) thing to note is that both .dt.strftime
and str.replace
of pandas are not optimized, so calling Python’s strftime
and str.replace
via apply()
is actually faster than the pandas counterparts (in the case of strftime
, it is much faster).
dates = pd.Series(pd.date_range('2020','2200', freq='d'))
%timeit dates.dt.strftime('%Y%m%d')
# 719 ms ± 41.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit dates.apply(datetime.strftime, format='%Y%m%d')
# 472 ms ± 34.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
dates = dates.astype(str)
%timeit dates.str.replace('-', '')
# 30.9 ms ± 2.46 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit dates.apply(lambda x: x.replace('-', ''))
# 26 ms ± 183 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
I have a dataframe with datetime column in the format yyyy-mm-dd
. I would like to have it in integer format yyyymmdd
. When I try:
x=dates.apply(dt.datetime.strftime('%Y%m%d')).astype(int)
I keep getting an error:
TypeError: descriptor 'strftime' requires a 'datetime.date' object but received a 'str'
This doesn’t work if I try to pass an array. I know that if I pass just on element it will convert, but what is a more pythonic way to do it? I did try using lambda but that didn’t work either.
If your column is a string, you will need to first use `pd.to_datetime’,
df['Date'] = pd.to_datetime(df['Date'])
Then, use .dt
datetime accessor with strftime
:
df = pd.DataFrame({'Date':pd.date_range('2017-01-01', periods = 60, freq='D')})
df.Date.dt.strftime('%Y%m%d').astype(int)
Or use lambda function:
df.Date.apply(lambda x: x.strftime('%Y%m%d')).astype(int)
Output:
0 20170101
1 20170102
2 20170103
3 20170104
4 20170105
5 20170106
6 20170107
7 20170108
8 20170109
9 20170110
10 20170111
11 20170112
12 20170113
13 20170114
14 20170115
15 20170116
16 20170117
17 20170118
18 20170119
19 20170120
20 20170121
21 20170122
22 20170123
23 20170124
24 20170125
25 20170126
26 20170127
27 20170128
28 20170129
29 20170130
30 20170131
31 20170201
32 20170202
33 20170203
34 20170204
35 20170205
36 20170206
37 20170207
38 20170208
39 20170209
40 20170210
41 20170211
42 20170212
43 20170213
44 20170214
45 20170215
46 20170216
47 20170217
48 20170218
49 20170219
50 20170220
51 20170221
52 20170222
53 20170223
54 20170224
55 20170225
56 20170226
57 20170227
58 20170228
59 20170301
Name: Date, dtype: int32
The error in the OP occurred because datetime.datetime.strftime
was called without a datetime/date argument in apply()
. The format=
should be passed as a separate argument to apply()
, which will be passed off to strftime()
as the format.
from datetime import datetime
x = dates.apply(datetime.strftime, format='%Y%m%d').astype(int)
If the date were strings (instead of datetime/date), then str.replace()
should do the job.
x = dates.str.replace('-', '').astype(int)
# using apply
x = dates.apply(lambda x: x.replace('-', '')).astype(int)
A mildly interesting(?) thing to note is that both .dt.strftime
and str.replace
of pandas are not optimized, so calling Python’s strftime
and str.replace
via apply()
is actually faster than the pandas counterparts (in the case of strftime
, it is much faster).
dates = pd.Series(pd.date_range('2020','2200', freq='d'))
%timeit dates.dt.strftime('%Y%m%d')
# 719 ms ± 41.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit dates.apply(datetime.strftime, format='%Y%m%d')
# 472 ms ± 34.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
dates = dates.astype(str)
%timeit dates.str.replace('-', '')
# 30.9 ms ± 2.46 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit dates.apply(lambda x: x.replace('-', ''))
# 26 ms ± 183 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)