Python pandas convert datetime to timestamp effectively through dt accessor
Question:
I have a DataFrame with some (hundreds of) million of rows. And I want to convert datetime to timestamp effectively. How can I do it?
My sample df
:
df = pd.DataFrame(index=pd.DatetimeIndex(start=dt.datetime(2016,1,1,0,0,1),
end=dt.datetime(2016,1,2,0,0,1), freq='H'))
.reset_index().rename(columns={'index':'datetime'})
which looks like:
datetime
0 2016-01-01 00:00:01
1 2016-01-01 01:00:01
2 2016-01-01 02:00:01
3 2016-01-01 03:00:01
4 2016-01-01 04:00:01
Now I convert datetime to timestamp value-by-value with .apply()
but it takes a very long time (some hours) if I have some (hundreds of) million rows:
df['ts'] = df[['datetime']].apply(lambda x: x[0].timestamp(), axis=1).astype(int)
Output:
datetime ts
0 2016-01-01 00:00:01 1451602801
1 2016-01-01 01:00:01 1451606401
2 2016-01-01 02:00:01 1451610001
3 2016-01-01 03:00:01 1451613601
4 2016-01-01 04:00:01 1451617201
The above result is what I want.
If I try to use the .dt
accessor of pandas.Series
then I get error message:
df['ts'] = df['datetime'].dt.timestamp
AttributeError: ‘DatetimeProperties’ object has no attribute
‘timestamp’
If I try to create eg. the date parts of datetimes with the .dt
accessor then it is much faster then using .apply()
:
df['date'] = df['datetime'].dt.date
Output:
datetime ts date
0 2016-01-01 00:00:01 1451602801 2016-01-01
1 2016-01-01 01:00:01 1451606401 2016-01-01
2 2016-01-01 02:00:01 1451610001 2016-01-01
3 2016-01-01 03:00:01 1451613601 2016-01-01
4 2016-01-01 04:00:01 1451617201 2016-01-01
I want something similar with timestamps…
But I don’t really understand the official documentation: it talks about "Converting to Timestamps" but I don’t see any timestamps there; it just talks about converting to datetime with pd.to_datetime()
but not to timestamp…
pandas.Timestamp
constructor also doesn’t work (returns with the below error):
df['ts2'] = pd.Timestamp(df['datetime'])
TypeError: Cannot convert input to Timestamp
pandas.Series.to_timestamp
also makes something totally different from what I want:
df['ts3'] = df['datetime'].to_timestamp
Output:
datetime ts ts3
0 2016-01-01 00:00:01 1451602801 <bound method Series.to_timestamp of 0 2016...
1 2016-01-01 01:00:01 1451606401 <bound method Series.to_timestamp of 0 2016...
2 2016-01-01 02:00:01 1451610001 <bound method Series.to_timestamp of 0 2016...
3 2016-01-01 03:00:01 1451613601 <bound method Series.to_timestamp of 0 2016...
4 2016-01-01 04:00:01 1451617201 <bound method Series.to_timestamp of 0 2016...
Answers:
I think you need convert first to numpy array
by values
and cast to int64
– output is in ns
, so need divide by 10 ** 9
:
df['ts'] = df.datetime.values.astype(np.int64) // 10 ** 9
print (df)
datetime ts
0 2016-01-01 00:00:01 1451606401
1 2016-01-01 01:00:01 1451610001
2 2016-01-01 02:00:01 1451613601
3 2016-01-01 03:00:01 1451617201
4 2016-01-01 04:00:01 1451620801
5 2016-01-01 05:00:01 1451624401
6 2016-01-01 06:00:01 1451628001
7 2016-01-01 07:00:01 1451631601
8 2016-01-01 08:00:01 1451635201
9 2016-01-01 09:00:01 1451638801
10 2016-01-01 10:00:01 1451642401
11 2016-01-01 11:00:01 1451646001
12 2016-01-01 12:00:01 1451649601
13 2016-01-01 13:00:01 1451653201
14 2016-01-01 14:00:01 1451656801
15 2016-01-01 15:00:01 1451660401
16 2016-01-01 16:00:01 1451664001
17 2016-01-01 17:00:01 1451667601
18 2016-01-01 18:00:01 1451671201
19 2016-01-01 19:00:01 1451674801
20 2016-01-01 20:00:01 1451678401
21 2016-01-01 21:00:01 1451682001
22 2016-01-01 22:00:01 1451685601
23 2016-01-01 23:00:01 1451689201
24 2016-01-02 00:00:01 1451692801
to_timestamp
is used for converting from period to datetime index.
There’s also another method to do this using the "hidden" attribute of DatetimeIndex
called asi8
, which creates an integer timestamp.
pd.DatetimeIndex(df.datetime).asi8
Wes McKinney suggested it in this tangentially related stackoverflow question linked here
If you don’t want to use numpy you can use pure pandas conversions
df['ts'] = pd.to_timedelta(df['datetime'], unit='ns').dt.total_seconds().astype(int)
I think you should not use apply,
simply astype
would be fine:
df['ts'] = df.datetime.astype('int64') // 10**9
One option would be to use a lambda expressions like such
df['datetime'] = df['datetime'].apply(lambda x: pd.Timestamp(x))
the easiest way to convert pandas.datetime to unix timestamp is:
df['datetime'].values.tolist()
Another option is to use pandas.to_numeric
:
df['datetime'] = pandas.to_numeric(df['datetime'].values) / 10 ** 9
Plenty of correct answers just be mindful of the deprecation notice on using astype
. The recommended way of doing this now is:
df['ts'] = df.datetime.view('int64')
I have a DataFrame with some (hundreds of) million of rows. And I want to convert datetime to timestamp effectively. How can I do it?
My sample df
:
df = pd.DataFrame(index=pd.DatetimeIndex(start=dt.datetime(2016,1,1,0,0,1),
end=dt.datetime(2016,1,2,0,0,1), freq='H'))
.reset_index().rename(columns={'index':'datetime'})
which looks like:
datetime
0 2016-01-01 00:00:01
1 2016-01-01 01:00:01
2 2016-01-01 02:00:01
3 2016-01-01 03:00:01
4 2016-01-01 04:00:01
Now I convert datetime to timestamp value-by-value with .apply()
but it takes a very long time (some hours) if I have some (hundreds of) million rows:
df['ts'] = df[['datetime']].apply(lambda x: x[0].timestamp(), axis=1).astype(int)
Output:
datetime ts
0 2016-01-01 00:00:01 1451602801
1 2016-01-01 01:00:01 1451606401
2 2016-01-01 02:00:01 1451610001
3 2016-01-01 03:00:01 1451613601
4 2016-01-01 04:00:01 1451617201
The above result is what I want.
If I try to use the .dt
accessor of pandas.Series
then I get error message:
df['ts'] = df['datetime'].dt.timestamp
AttributeError: ‘DatetimeProperties’ object has no attribute
‘timestamp’
If I try to create eg. the date parts of datetimes with the .dt
accessor then it is much faster then using .apply()
:
df['date'] = df['datetime'].dt.date
Output:
datetime ts date
0 2016-01-01 00:00:01 1451602801 2016-01-01
1 2016-01-01 01:00:01 1451606401 2016-01-01
2 2016-01-01 02:00:01 1451610001 2016-01-01
3 2016-01-01 03:00:01 1451613601 2016-01-01
4 2016-01-01 04:00:01 1451617201 2016-01-01
I want something similar with timestamps…
But I don’t really understand the official documentation: it talks about "Converting to Timestamps" but I don’t see any timestamps there; it just talks about converting to datetime with pd.to_datetime()
but not to timestamp…
pandas.Timestamp
constructor also doesn’t work (returns with the below error):
df['ts2'] = pd.Timestamp(df['datetime'])
TypeError: Cannot convert input to Timestamp
pandas.Series.to_timestamp
also makes something totally different from what I want:
df['ts3'] = df['datetime'].to_timestamp
Output:
datetime ts ts3
0 2016-01-01 00:00:01 1451602801 <bound method Series.to_timestamp of 0 2016...
1 2016-01-01 01:00:01 1451606401 <bound method Series.to_timestamp of 0 2016...
2 2016-01-01 02:00:01 1451610001 <bound method Series.to_timestamp of 0 2016...
3 2016-01-01 03:00:01 1451613601 <bound method Series.to_timestamp of 0 2016...
4 2016-01-01 04:00:01 1451617201 <bound method Series.to_timestamp of 0 2016...
I think you need convert first to numpy array
by values
and cast to int64
– output is in ns
, so need divide by 10 ** 9
:
df['ts'] = df.datetime.values.astype(np.int64) // 10 ** 9
print (df)
datetime ts
0 2016-01-01 00:00:01 1451606401
1 2016-01-01 01:00:01 1451610001
2 2016-01-01 02:00:01 1451613601
3 2016-01-01 03:00:01 1451617201
4 2016-01-01 04:00:01 1451620801
5 2016-01-01 05:00:01 1451624401
6 2016-01-01 06:00:01 1451628001
7 2016-01-01 07:00:01 1451631601
8 2016-01-01 08:00:01 1451635201
9 2016-01-01 09:00:01 1451638801
10 2016-01-01 10:00:01 1451642401
11 2016-01-01 11:00:01 1451646001
12 2016-01-01 12:00:01 1451649601
13 2016-01-01 13:00:01 1451653201
14 2016-01-01 14:00:01 1451656801
15 2016-01-01 15:00:01 1451660401
16 2016-01-01 16:00:01 1451664001
17 2016-01-01 17:00:01 1451667601
18 2016-01-01 18:00:01 1451671201
19 2016-01-01 19:00:01 1451674801
20 2016-01-01 20:00:01 1451678401
21 2016-01-01 21:00:01 1451682001
22 2016-01-01 22:00:01 1451685601
23 2016-01-01 23:00:01 1451689201
24 2016-01-02 00:00:01 1451692801
to_timestamp
is used for converting from period to datetime index.
There’s also another method to do this using the "hidden" attribute of DatetimeIndex
called asi8
, which creates an integer timestamp.
pd.DatetimeIndex(df.datetime).asi8
Wes McKinney suggested it in this tangentially related stackoverflow question linked here
If you don’t want to use numpy you can use pure pandas conversions
df['ts'] = pd.to_timedelta(df['datetime'], unit='ns').dt.total_seconds().astype(int)
I think you should not use apply,
simply astype
would be fine:
df['ts'] = df.datetime.astype('int64') // 10**9
One option would be to use a lambda expressions like such
df['datetime'] = df['datetime'].apply(lambda x: pd.Timestamp(x))
the easiest way to convert pandas.datetime to unix timestamp is:
df['datetime'].values.tolist()
Another option is to use pandas.to_numeric
:
df['datetime'] = pandas.to_numeric(df['datetime'].values) / 10 ** 9
Plenty of correct answers just be mindful of the deprecation notice on using astype
. The recommended way of doing this now is:
df['ts'] = df.datetime.view('int64')