pandas datetime to unix timestamp seconds
Question:
From the official documentation of pandas.to_datetime we can say,
unit : string, default ‘ns’
unit of the arg (D,s,ms,us,ns) denote the unit, which is an integer or
float number. This will be based off the origin. Example, with
unit=’ms’ and origin=’unix’ (the default), this would calculate the
number of milliseconds to the unix epoch start.
So when I try like this way,
import pandas as pd
df = pd.DataFrame({'time': [pd.to_datetime('2019-01-15 13:25:43')]})
df_unix_sec = pd.to_datetime(df['time'], unit='ms', origin='unix')
print(df)
print(df_unix_sec)
time
0 2019-01-15 13:25:43
0 2019-01-15 13:25:43
Name: time, dtype: datetime64[ns]
Output is not changing for the latter one. Every time it is showing the datetime value not number of milliseconds to the unix epoch start for the 2nd one. Why is that? Am I missing something?
Answers:
I think you misunderstood what the argument is for. The purpose of origin='unix'
is to convert an integer timestamp to datetime
, not the other way.
pd.to_datetime(1.547559e+09, unit='s', origin='unix')
# Timestamp('2019-01-15 13:30:00')
Here are some options:
Option 1: integer division
Conversely, you can get the timestamp by converting to integer (to get nanoseconds) and divide by 109.
pd.to_datetime(['2019-01-15 13:30:00']).astype(int) / 10**9
# Float64Index([1547559000.0], dtype='float64')
Pros:
- super fast
Cons:
- makes assumptions about how pandas internally stores dates
Option 2: recommended by pandas
Pandas docs recommend using the following method:
# create test data
dates = pd.to_datetime(['2019-01-15 13:30:00'])
# calculate unix datetime
(dates - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s')
[out]:
Int64Index([1547559000], dtype='int64')
Pros:
- "idiomatic", recommended by the library
Cons:
- unweildy
- not as performant as integer division
Option 3: pd.Timestamp
If you have a single date string, you can use pd.Timestamp
as shown in the other answer:
pd.Timestamp('2019-01-15 13:30:00').timestamp()
# 1547559000.0
If you have to cooerce multiple datetimes (where pd.to_datetime
is your only option), you can initialize and map:
pd.to_datetime(['2019-01-15 13:30:00']).map(pd.Timestamp.timestamp)
# Float64Index([1547559000.0], dtype='float64')
Pros:
- best method for a single datetime string
- easy to remember
Cons:
- not as performant as integer division
In case you are accessing a particular datetime64
object from the dataframe, chances are that pandas will return a Timestamp
object which is essentially how pandas stores datetime64
objects.
You can use pd.Timestamp.to_datetime64()
method of the pd.Timestamp
object to convert it to numpy.datetime64
object with ns
precision.
value
attribute of the pandas Timestamp holds the unix epoch. This value is in nanoseconds. So you can convert to ms or us by diving by 1e3 or 1e6. Check the code below.
import pandas as pd
date_1 = pd.to_datetime('2020-07-18 18:50:00')
print(date_1.value)
You can use timestamp() method which returns POSIX timestamp as float:
pd.Timestamp('2021-04-01').timestamp()
[Out]:
1617235200.0
pd.Timestamp('2021-04-01 00:02:35.234').timestamp()
[Out]:
1617235355.234
When you calculate the difference between two datetimes, the dtype of the difference is timedelta64[ns]
by default (ns
in brackets). By changing [ns]
into [ms]
, [s]
, [m]
etc as you cast the output to a new timedelta64
object, you can convert the difference into milliseconds, seconds, minutes etc.
For example, to find the number of seconds passed since Unix epoch, subtract datetimes and change dtype.
df_unix_sec = (df['time'] - pd.Timestamp('1970-01-01')).astype('timedelta64[s]')
N.B. Oftentimes, the differences are very large numbers, so if you want them as integers, use astype('int64')
(NOT astype(int)
).
df_unix_sec = (df['time'] - pd.Timestamp('1970-01-01')).astype('timedelta64[s]').astype('int64')
For OP’s example, this would yield,
0 1547472343
Name: time, dtype: int64
From the official documentation of pandas.to_datetime we can say,
unit : string, default ‘ns’
unit of the arg (D,s,ms,us,ns) denote the unit, which is an integer or
float number. This will be based off the origin. Example, with
unit=’ms’ and origin=’unix’ (the default), this would calculate the
number of milliseconds to the unix epoch start.
So when I try like this way,
import pandas as pd
df = pd.DataFrame({'time': [pd.to_datetime('2019-01-15 13:25:43')]})
df_unix_sec = pd.to_datetime(df['time'], unit='ms', origin='unix')
print(df)
print(df_unix_sec)
time
0 2019-01-15 13:25:43
0 2019-01-15 13:25:43
Name: time, dtype: datetime64[ns]
Output is not changing for the latter one. Every time it is showing the datetime value not number of milliseconds to the unix epoch start for the 2nd one. Why is that? Am I missing something?
I think you misunderstood what the argument is for. The purpose of origin='unix'
is to convert an integer timestamp to datetime
, not the other way.
pd.to_datetime(1.547559e+09, unit='s', origin='unix')
# Timestamp('2019-01-15 13:30:00')
Here are some options:
Option 1: integer division
Conversely, you can get the timestamp by converting to integer (to get nanoseconds) and divide by 109.
pd.to_datetime(['2019-01-15 13:30:00']).astype(int) / 10**9
# Float64Index([1547559000.0], dtype='float64')
Pros:
- super fast
Cons:
- makes assumptions about how pandas internally stores dates
Option 2: recommended by pandas
Pandas docs recommend using the following method:
# create test data
dates = pd.to_datetime(['2019-01-15 13:30:00'])
# calculate unix datetime
(dates - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s')
[out]:
Int64Index([1547559000], dtype='int64')
Pros:
- "idiomatic", recommended by the library
Cons:
- unweildy
- not as performant as integer division
Option 3: pd.Timestamp
If you have a single date string, you can use pd.Timestamp
as shown in the other answer:
pd.Timestamp('2019-01-15 13:30:00').timestamp()
# 1547559000.0
If you have to cooerce multiple datetimes (where pd.to_datetime
is your only option), you can initialize and map:
pd.to_datetime(['2019-01-15 13:30:00']).map(pd.Timestamp.timestamp)
# Float64Index([1547559000.0], dtype='float64')
Pros:
- best method for a single datetime string
- easy to remember
Cons:
- not as performant as integer division
In case you are accessing a particular datetime64
object from the dataframe, chances are that pandas will return a Timestamp
object which is essentially how pandas stores datetime64
objects.
You can use pd.Timestamp.to_datetime64()
method of the pd.Timestamp
object to convert it to numpy.datetime64
object with ns
precision.
value
attribute of the pandas Timestamp holds the unix epoch. This value is in nanoseconds. So you can convert to ms or us by diving by 1e3 or 1e6. Check the code below.
import pandas as pd
date_1 = pd.to_datetime('2020-07-18 18:50:00')
print(date_1.value)
You can use timestamp() method which returns POSIX timestamp as float:
pd.Timestamp('2021-04-01').timestamp()
[Out]:
1617235200.0
pd.Timestamp('2021-04-01 00:02:35.234').timestamp()
[Out]:
1617235355.234
When you calculate the difference between two datetimes, the dtype of the difference is timedelta64[ns]
by default (ns
in brackets). By changing [ns]
into [ms]
, [s]
, [m]
etc as you cast the output to a new timedelta64
object, you can convert the difference into milliseconds, seconds, minutes etc.
For example, to find the number of seconds passed since Unix epoch, subtract datetimes and change dtype.
df_unix_sec = (df['time'] - pd.Timestamp('1970-01-01')).astype('timedelta64[s]')
N.B. Oftentimes, the differences are very large numbers, so if you want them as integers, use astype('int64')
(NOT astype(int)
).
df_unix_sec = (df['time'] - pd.Timestamp('1970-01-01')).astype('timedelta64[s]').astype('int64')
For OP’s example, this would yield,
0 1547472343
Name: time, dtype: int64